[
https://issues.apache.org/jira/browse/HDFS-14786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917325#comment-16917325
]
Mingliang Liu commented on HDFS-14786:
--------------------------------------
Thoughts? [~apurtell] [~szetszwo] [~jojochuang] [[email protected]] Thanks!
> A new block placement policy tolerating availability zone failure
> -----------------------------------------------------------------
>
> Key: HDFS-14786
> URL: https://issues.apache.org/jira/browse/HDFS-14786
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: block placement
> Reporter: Mingliang Liu
> Priority: Major
>
> {{NetworkTopology}} assumes "/datacenter/rack/host" 3 layer topology. Default
> block placement policies are rack awareness for better fault tolerance Newer
> block placement policy like {{BlockPlacementPolicyRackFaultTolerant}} tries
> its best to place the replicas to most racks, which further tolerates more
> racks failing. [HADOOP-8470] brought {{NetworkTopologyWithNodeGroup}} to add
> another layer under rack, i.e. "/datacenter/rack/host/nodegroup" 4 layer
> topology. With that, replicas within a rack can be placed in different node
> groups for better isolation.
> Existing block placement policies tolerate rack failure since at least two
> racks are chosen in those cases. Chances are all replicas could be placed in
> the same datacenter, though there are multiple data centers in the same
> cluster topology. In other words, fault of higher layers beyond rack is not
> well tolerated.
> However, more deployments in public cloud are leveraging multiple available
> zones (AZ) for high-availability since the inter-AZ latency seems affordable
> in many cases. In a single AZ, some cloud providers like AWS support
> [partitioned placement
> groups|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition]
> which basically are different racks. A simple network topology mapped to
> HDFS is "/availabilityzone/rack/host" 3 layers.
> To achieve high availability tolerating zone failure, this JIRA proposes a
> new data placement policy which tries its best to place replicas in most AZs,
> most racks, and most evenly distributed.
> Examples with 3 replicas, we choose racks as following:
> # 1AZ: fall back to {{BlockPlacementPolicyRackFaultTolerant}} to most racks
> # 2AZ: randomly choose one rack in 1st AZ and randomly choose two racks in
> the other AZ
> # 3AZ: randomly choose one rack in one AZ
> # 4AZ: randomly choose three AZ and one rack in each AZ
> After racks are chosen, hosts are chosen randomly honoring local storage,
> favorite nodes, excluded nodes, storage types etc.
> Data may become imbalance if topology is very uneven in AZs. This seems not a
> problem as in public cloud, infrastructure provisioning is more flexible than
> 1P.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]