[ 
https://issues.apache.org/jira/browse/HDFS-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174447#comment-15174447
 ] 

Jing Zhao commented on HDFS-8786:
---------------------------------

Thanks for updating the patch, [~rakeshr]. Comments on the current patch:
1. For ErasureCodingWork, we have to handle the following scenarios when 
{{hasAllInternalBlocks}} returns true:
#* we have decommissioning DN
#* we have enough DN but not enough racks
#* the above two situations happen at the same time
Things may get a little complicated when decommissioning situation and more 
racks situation get mixed. For example, it is possible that there are 9 live 
internal blocks on 5 racks, and 1 more internal block in a decommissioning 
datanode. In this situation, we will only choose 1 target and the 
decommissioning dn should be ignored. In another example, if we have 8 live 
replicas and 1 decommissioning replica, we should replicate the decommissioning 
replica. Looks to me the current patch cannot handle all the scenarios.

Currently I think we should explicitly let ErasureCodingWork know if the 
reconstruction work is triggered by not-enough-racks. We can have this check in 
{{validateReconstructionWork}}, and pass the result into the ErasureCodingWork 
instance. Later when adding task to DN, we should first check this result, and 
if it is true, run the current code added by HDFS-9818. If it is false, we 
check if the source nodes cover all the internal blocks but contain 
decommissioning datanode, and schedule replication work for it if necessary.

> Erasure coding: DataNode should transfer striped blocks before being 
> decommissioned
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-8786
>                 URL: https://issues.apache.org/jira/browse/HDFS-8786
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: Rakesh R
>         Attachments: HDFS-8786-001.patch, HDFS-8786-002.patch, 
> HDFS-8786-003.patch, HDFS-8786-draft.patch
>
>
> Per [discussion | 
> https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004]
>  under HDFS-8697, it's too expensive to reconstruct block groups for decomm 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to