[jira] [Commented] (HDFS-12049) Recommissioning live nodes stalls the NN

Sunil Govindan (JIRA) Mon, 17 Sep 2018 01:56:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617249#comment-16617249
 ]


Sunil Govindan commented on HDFS-12049:
---------------------------------------

As code freeze for 3.2 is crossed, moving this Jira to 3.3.  Please feel free 
to revert if anyone has concerns. Thank you.

> Recommissioning live nodes stalls the NN
> ----------------------------------------
>
>                 Key: HDFS-12049
>                 URL: https://issues.apache.org/jira/browse/HDFS-12049
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> A node refresh will recommission included nodes that are alive and in 
> decommissioning or decommissioned state.  The recommission will scan all 
> blocks on the node, find over replicated blocks, chose an excess, queue an 
> invalidate.
> The process is expensive and worsened by overhead of storage types (even when 
> not in use).  It can be especially devastating because the write lock is held 
> for the entire node refresh.  _Recommissioning 67 nodes with ~500k 
> blocks/node stalled rpc services for over 4 mins._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-12049) Recommissioning live nodes stalls the NN

Reply via email to