[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

Stephen O'Donnell (Jira) Wed, 09 Oct 2019 02:48:54 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947495#comment-16947495
 ]


Stephen O'Donnell commented on HDFS-14854:
------------------------------------------

It turned out to be fairly easy to take the monitor classes out the 
DatanodeAdminManager class, but it does make it difficult to see the changes 
made the to original monitor to allow this to happen.

Basically I had to move "pendingNodes" into the monitor classes and have 
DatanodeAdminManager call the monitor to add and remove nodes from it. I also 
need to pass namesystem, blockManager and the DatanodeAdminManager to the 
monitor constructor so they can access these things which were shared with the 
DatanodeAdmin class.

I think this change is a positive one, and it does make the monitor fully 
pluggable.

All existing decommission tests pass locally and my additional tests classes 
which reuse the original decom test in 
TestDecommissioningStatusWithBackoffMonitor and 
TestDecommissionWithBackoffMonitor are passing too.

> Create improved decommission monitor implementation
> ---------------------------------------------------
>
>                 Key: HDFS-14854
>                 URL: https://issues.apache.org/jira/browse/HDFS-14854
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

Reply via email to