[ 
https://issues.apache.org/jira/browse/HDFS-14786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917325#comment-16917325
 ] 

Mingliang Liu commented on HDFS-14786:
--------------------------------------

Thoughts? [~apurtell]  [~szetszwo]  [~jojochuang]  [[email protected]] Thanks!

> A new block placement policy tolerating availability zone failure
> -----------------------------------------------------------------
>
>                 Key: HDFS-14786
>                 URL: https://issues.apache.org/jira/browse/HDFS-14786
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: block placement
>            Reporter: Mingliang Liu
>            Priority: Major
>
> {{NetworkTopology}} assumes "/datacenter/rack/host" 3 layer topology. Default 
> block placement policies are rack awareness for better fault tolerance Newer 
> block placement policy like {{BlockPlacementPolicyRackFaultTolerant}} tries 
> its best to place the replicas to most racks, which further tolerates more 
> racks failing. [HADOOP-8470] brought {{NetworkTopologyWithNodeGroup}} to add 
> another layer under rack, i.e. "/datacenter/rack/host/nodegroup" 4 layer 
> topology. With that, replicas within a rack can be placed in different node 
> groups for better isolation.
> Existing block placement policies tolerate rack failure since at least two 
> racks are chosen in those cases. Chances are all replicas could be placed in 
> the same datacenter, though there are multiple data centers in the same 
> cluster topology. In other words, fault of higher layers beyond rack is not 
> well tolerated.
> However, more deployments in public cloud are leveraging multiple available 
> zones (AZ) for high-availability since the inter-AZ latency seems affordable 
> in many cases. In a single AZ, some cloud providers like AWS support 
> [partitioned placement 
> groups|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition]
>  which basically are different racks. A simple network topology mapped to 
> HDFS is "/availabilityzone/rack/host" 3 layers.
> To achieve high availability tolerating zone failure, this JIRA proposes a 
> new data placement policy which tries its best to place replicas in most AZs, 
> most racks, and most evenly distributed.
> Examples with 3 replicas, we choose racks as following:
> # 1AZ: fall back to {{BlockPlacementPolicyRackFaultTolerant}} to most racks
> # 2AZ: randomly choose one rack in 1st AZ and randomly choose two racks in 
> the other AZ
> # 3AZ: randomly choose one rack in one AZ
> # 4AZ: randomly choose three AZ and one rack in each AZ
> After racks are chosen, hosts are chosen randomly honoring local storage, 
> favorite nodes, excluded nodes, storage types etc.
> Data may become imbalance if topology is very uneven in AZs. This seems not a 
> problem as in public cloud, infrastructure provisioning is more flexible than 
> 1P.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to