[
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhao Yi Ming updated HDFS-14699:
--------------------------------
Attachment: HDFS-14699.05.patch
Status: Patch Available (was: Open)
[~ayushtkn] Sorry for the misunderstand! and Thanks for your explain!
I changed the code to avoid recalculating the block index. For the UT I make a
little mistake -
NOT increment Pending Replications in the node which have the dup EC internal
block. Now The UT code also updated. Could you help review again? Thanks!
> Erasure Coding: Can NOT trigger the reconstruction when have the dup internal
> blocks and missing one internal block
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ec
> Affects Versions: 3.1.1, 3.2.0, 3.3.0
> Reporter: Zhao Yi Ming
> Assignee: Zhao Yi Ming
> Priority: Critical
> Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch,
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch,
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png,
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881.
> Following are our testing steps, hope it can helpful.(following DNs have the
> testing internal blocks)
> # we customized a new 10-2-1024k policy and use it on a path, now we have 12
> internal block(12 live block)
> # decommission one DN, after the decommission complete. now we have 13
> internal block(12 live block and 1 decommission block)
> # then shutdown one DN which did not have the same block id as 1
> decommission block, now we have 12 internal block(11 live block and 1
> decommission block)
> # after wait for about 600s (before the heart beat come) commission the
> decommissioned DN again, now we have 12 internal block(11 live block and 1
> duplicate block)
> # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production
> env. Could you help? Thanks a lot!
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]