[ 
https://issues.apache.org/jira/browse/HDFS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479315#comment-13479315
 ] 

Kihwal Lee commented on HDFS-4075:
----------------------------------

On recommissioning, the dead nodes will not cause this overhead at that moment 
(i.e. not in the same write lock block). They will do their own share of 
logging storm when they rejoin and send in the full block reports, which would 
block the namenode for 6-7 seconds in the above example. They will at least let 
others run in between such block reports. Or the nodes can be brought up in a 
controlled manner to reduce the impact. E.g. two data node start-ups per minute.

But the live nodes at the time of recommissioning can cause problems, unless 
processing of potentially over-replicated blocks become asynchronous to 
recommissioning and also throttled. Doing invalidation inline but pausing and 
releasing the lock won't be ideal since it will prolong the duration of 
refreshNode command execution. Delaying this work using the mis-replicated 
blocks handling can make it asynchronous, but it cannot be throttled; at the 
next block report, all will be processed.

I think the simplest remedy is to disable the state change logging for block 
invalidation during recommissioning. 

On a busy namenode, the overhead of logging every block state change may not be 
negligible. We might want to add a capability to selectively disable certain 
class of state change logging. (There are already places that disables logging 
for every block)

                
> Reduce recommissioning overhead
> -------------------------------
>
>                 Key: HDFS-4075
>                 URL: https://issues.apache.org/jira/browse/HDFS-4075
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.4, 2.0.2-alpha
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>
> When datanodes are recommissioned, 
> {BlockManager#processOverReplicatedBlocksOnReCommission()} is called for each 
> rejoined node and excess blocks are added to the invalidate list. The problem 
> is this is done while the namesystem write lock is held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to