[jira] [Commented] (HDFS-14786) A new block placement policy tolerating availability zone failure

Mingliang Liu (Jira) Fri, 30 Aug 2019 15:29:12 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919931#comment-16919931
 ]


Mingliang Liu commented on HDFS-14786:
--------------------------------------

[~surendrasingh] For now in public cloud, there are not many regions which have 
more than 4 AZs. In our use case, most if not all data have exactly 3 replicas, 
so adding one more AZ does not significantly improve HDFS data availability as 
a whole. Thanks for bring up erasure coding (EC, really good point. And in EC, 
data+parity blocks are distributed evenly across the racks. The default policy 
{{BlockPlacementPolicyRackFaultTolerant}} works perfectly fine for EC if we 
don't need to tolerate higher level, call it *zone* (meaning "datacenter" in 
traditional 1P data-center, or "AZ" in public cloud). But since the racks are 
chosen randomly regardless zone distribution, it would not tolerate AZ failure 
in public cloud.

The example you gave makes perfect sense to me. I think this new block 
placement policy could also be used by EC as well. I think the implementation 
may not need to make much assumption regarding replica vs. EC.

> A new block placement policy tolerating availability zone failure
> -----------------------------------------------------------------
>
>                 Key: HDFS-14786
>                 URL: https://issues.apache.org/jira/browse/HDFS-14786
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: block placement
>            Reporter: Mingliang Liu
>            Priority: Major
>
> {{NetworkTopology}} assumes "/datacenter/rack/host" 3 layer topology. Default 
> block placement policies are rack awareness for better fault tolerance. Newer 
> block placement policy like {{BlockPlacementPolicyRackFaultTolerant}} tries 
> its best to place the replicas to most racks, which further tolerates more 
> racks failing. HADOOP-8470 brought {{NetworkTopologyWithNodeGroup}} to add 
> another layer under rack, i.e. "/datacenter/rack/host/nodegroup" 4 layer 
> topology. With that, replicas within a rack can be placed in different node 
> groups for better isolation.
> Existing block placement policies tolerate one rack failure since at least 
> two racks are chosen in those cases. Chances are all replicas could be placed 
> in the same datacenter, though there are multiple data centers in the same 
> cluster topology. In other words, fault of higher layers beyond rack is not 
> well tolerated.
> However, more deployments in public cloud are leveraging multiple available 
> zones (AZ) for high-availability since the inter-AZ latency seems affordable 
> in many cases. In a single AZ, some cloud providers like AWS support 
> [partitioned placement 
> groups|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition]
>  which basically are different racks. A simple network topology mapped to 
> HDFS is "/availabilityzone/rack/host" 3 layers.
> To achieve high availability tolerating zone failure, this JIRA proposes a 
> new data placement policy which tries its best to place replicas in most AZs, 
> most racks, and most evenly distributed.
> Examples with 3 replicas, we choose racks as following:
>  - 1AZ: fall back to {{BlockPlacementPolicyRackFaultTolerant}} to place among 
> most racks
>  - 2AZ: randomly choose one rack in one AZ and randomly choose two racks in 
> the other AZ
>  - 3AZ: randomly choose one rack in every AZ
>  - 4AZ: randomly choose three AZs and randomly choose one rack in every AZ
> After racks are picked, hosts are chosen randomly within racks honoring local 
> storage, favorite nodes, excluded nodes, storage types etc. Data may become 
> imbalance if topology is very uneven in AZs. This seems not a problem as in 
> public cloud, infrastructure provisioning is more flexible than 1P.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14786) A new block placement policy tolerating availability zone failure

Reply via email to