[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

Stephen O'Donnell (Jira) Wed, 09 Oct 2019 10:36:16 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947876#comment-16947876
 ]


Stephen O'Donnell commented on HDFS-14854:
------------------------------------------

I will upload a new patch that addresses the find bugs and hopefully most 
styles issues, but some keys are too long for one line so we cannot get rid of 
all the check style warnings.

{quote}

Can DatanodeAdminBackoffMonitor extend DatanodeAdminDefaultMonitor? I haven't 
checked too carefully but it looks like they share a base.

{quote}

The do have some copy and pasted code. I need to see how much they can actually 
shared, and if we have a long term goal of removing the original monitor, then 
maybe its best they are completely separated, or they both inherit from an 
abstract base class. I will look into it some more.

I have added the Java doc and changed the definition of pendingRep to be a 
map<>.

{quote}

The initializer of the monitor in DatanodeAdminManager does a log and throw, 
following [~belugabehr] work, we should avoid this.

{quote}

Can you expand on this as I am not sure what David changed / suggested for this 
type of scenario? If the monitor fails to initialize we certainly want to 
throw, so do we just not need the log message too as it will be logged 
elsewhere?

{quote}

I still think we can use ReflectionUtils#newInstance now that we have this 
refactor. We can use the setConf() and set the BlockManager and 
DatanodeAdminManager as a setter.

{quote}

Do you know of any good examples where reflection utils does this? I had a 
quick look at the ReflectionUtils class, and its not clear how to use it, but I 
did not spend too much time looking at it.

 

 

> Create improved decommission monitor implementation
> ---------------------------------------------------
>
>                 Key: HDFS-14854
>                 URL: https://issues.apache.org/jira/browse/HDFS-14854
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

Reply via email to