[
https://issues.apache.org/jira/browse/HDFS-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486614#comment-14486614
]
Walter Su commented on HDFS-7613:
---------------------------------
As discussed in HDFS-7891, we think best rack failure tolerance maybe not
suitable for EC. Here we discuss a specific policy for EC.
As suggested by [~szetszwo], fault tolerance for one rack is enough for
EC/non-EC. Because normally only network/power failure bring rack down. Rack
will come up soon. The failure will not cause data loss. In short, fault
tolerance for one rack is enough.
I propose a policy for EC:
||rack_0||rack_1||rack_2||rack_3||rack_4||rack_5||
|data_0|data_1|data_2|data_3|data_4|data_5|
|prty_0|prty_1|prty_2|
Data block spread the most racks as they can. Parity block is colocated with
data block. Just like [~szetszwo] said, we should allow max 2 replicas per
rack. A little difference is that we should not place data_3 and data_4
together on rack_3. Client reconstruction should not slow down client reading.
Once rack_0 is down, client only need to reconstruct data_0, while parity_0 is
left to NN to reconstruct later. But once rack_3 is down, client need to
reconstruct 2 blocks before it can read them. In short, we should separate
data_3 and data_4.
> Block placement policy for erasure coding groups
> ------------------------------------------------
>
> Key: HDFS-7613
> URL: https://issues.apache.org/jira/browse/HDFS-7613
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Zhe Zhang
> Assignee: Walter Su
>
> Blocks in an erasure coding group should be placed in different failure
> domains -- different DataNodes at the minimum, and different racks ideally.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)