[jira] [Commented] (HDFS-13788) Update EC documentation about rack fault tolerance

Xiao Chen (JIRA) Mon, 13 Aug 2018 10:34:13 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578667#comment-16578667
 ]


Xiao Chen commented on HDFS-13788:
----------------------------------

Thanks [~knanasi] for the patch and [~zvenczel] for the review.
bq. For rack fault-tolerance, it is also important to have at least as many 
racks as the configured number of EC parity cells.
This is technically not correct. In case of RS(3,2), having 2 racks is not safe.
I suggest we word it something like: ... to have enough number of racks, so 
that on average, each rack holds number of blocks no more than the number of EC 
parity blocks. A formula to calculate this would be (data blocks + parity 
blocks) / parity blocks, rounding up.

Then in the 6,3 example, we add the example calculation: ... minimally 3 racks 
(calculated by (6 + 3) / 3 = 3) ...


It'd be great if we can add a note in the end as well, after:
bq. ...will still attempt to spread a striped file across multiple nodes to 
preserve node-level fault-tolerance.
For this reason, it is recommended to setup racks with similar number of 
DataNodes. 

> Update EC documentation about rack fault tolerance
> --------------------------------------------------
>
>                 Key: HDFS-13788
>                 URL: https://issues.apache.org/jira/browse/HDFS-13788
>             Project: Hadoop HDFS
>          Issue Type: Task
>          Components: documentation, erasure-coding
>    Affects Versions: 3.0.0
>            Reporter: Xiao Chen
>            Assignee: Kitti Nanasi
>            Priority: Major
>         Attachments: HDFS-13788.001.patch
>
>
> From 
> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html:
> {quote}
> For rack fault-tolerance, it is also important to have at least as many racks 
> as the configured EC stripe width. For EC policy RS (6,3), this means 
> minimally 9 racks, and ideally 10 or 11 to handle planned and unplanned 
> outages. For clusters with fewer racks than the stripe width, HDFS cannot 
> maintain rack fault-tolerance, but will still attempt to spread a striped 
> file across multiple nodes to preserve node-level fault-tolerance.
> {quote}
> Theoretical minimum is 3 racks, and ideally 9 or more, so the document should 
> be updated.
> (I didn't check timestamps, but this is probably due to 
> {{BlockPlacementPolicyRackFaultTolerant}} isn't completely done when 
> HDFS-9088 introduced this doc. Later there's also examples in 
> {{TestErasureCodingMultipleRacks}} to test this explicitly.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13788) Update EC documentation about rack fault tolerance

Reply via email to