[
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927280#comment-16927280
]
Zhao Yi Ming commented on HDFS-14699:
-------------------------------------
[~marvelrock] This fix only make sure the numReplicas are correct, and the
over-hardlimit srcNode will not be added into the srcNodes list, so the
reconstruction work will NOT use any over-hardlimit srcNode.
> Erasure Coding: Storage not considered in live replica when replication
> streams hard limit reached to threshold
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ec
> Affects Versions: 3.2.0, 3.1.1, 3.3.0
> Reporter: Zhao Yi Ming
> Assignee: Zhao Yi Ming
> Priority: Critical
> Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch,
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch,
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png,
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881.
> Following are our testing steps, hope it can helpful.(following DNs have the
> testing internal blocks)
> # we customized a new 10-2-1024k policy and use it on a path, now we have 12
> internal block(12 live block)
> # decommission one DN, after the decommission complete. now we have 13
> internal block(12 live block and 1 decommission block)
> # then shutdown one DN which did not have the same block id as 1
> decommission block, now we have 12 internal block(11 live block and 1
> decommission block)
> # after wait for about 600s (before the heart beat come) commission the
> decommissioned DN again, now we have 12 internal block(11 live block and 1
> duplicate block)
> # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production
> env. Could you help? Thanks a lot!
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]