[
https://issues.apache.org/jira/browse/HDFS-14786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917463#comment-16917463
]
Surendra Singh Lilhore commented on HDFS-14786:
-----------------------------------------------
Thanks [~liuml07] for jira and this idea LGTM.
{quote} - 4AZ: randomly choose three AZs and randomly choose one rack in every
AZ{quote}
I prefer this policy for better fault tolerance.
Are you going to consider EC also for better fault tolerance? If we have 3 AZ
and the policy is 3+2 then each AZ should contain maximum 2 EC block for better
fault tolerance in case of one AZ failure.
> A new block placement policy tolerating availability zone failure
> -----------------------------------------------------------------
>
> Key: HDFS-14786
> URL: https://issues.apache.org/jira/browse/HDFS-14786
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: block placement
> Reporter: Mingliang Liu
> Priority: Major
>
> {{NetworkTopology}} assumes "/datacenter/rack/host" 3 layer topology. Default
> block placement policies are rack awareness for better fault tolerance. Newer
> block placement policy like {{BlockPlacementPolicyRackFaultTolerant}} tries
> its best to place the replicas to most racks, which further tolerates more
> racks failing. HADOOP-8470 brought {{NetworkTopologyWithNodeGroup}} to add
> another layer under rack, i.e. "/datacenter/rack/host/nodegroup" 4 layer
> topology. With that, replicas within a rack can be placed in different node
> groups for better isolation.
> Existing block placement policies tolerate one rack failure since at least
> two racks are chosen in those cases. Chances are all replicas could be placed
> in the same datacenter, though there are multiple data centers in the same
> cluster topology. In other words, fault of higher layers beyond rack is not
> well tolerated.
> However, more deployments in public cloud are leveraging multiple available
> zones (AZ) for high-availability since the inter-AZ latency seems affordable
> in many cases. In a single AZ, some cloud providers like AWS support
> [partitioned placement
> groups|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition]
> which basically are different racks. A simple network topology mapped to
> HDFS is "/availabilityzone/rack/host" 3 layers.
> To achieve high availability tolerating zone failure, this JIRA proposes a
> new data placement policy which tries its best to place replicas in most AZs,
> most racks, and most evenly distributed.
> Examples with 3 replicas, we choose racks as following:
> - 1AZ: fall back to {{BlockPlacementPolicyRackFaultTolerant}} to place among
> most racks
> - 2AZ: randomly choose one rack in one AZ and randomly choose two racks in
> the other AZ
> - 3AZ: randomly choose one rack in every AZ
> - 4AZ: randomly choose three AZs and randomly choose one rack in every AZ
> After racks are picked, hosts are chosen randomly within racks honoring local
> storage, favorite nodes, excluded nodes, storage types etc. Data may become
> imbalance if topology is very uneven in AZs. This seems not a problem as in
> public cloud, infrastructure provisioning is more flexible than 1P.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]