[ 
https://issues.apache.org/jira/browse/HDFS-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918836#comment-16918836
 ] 

David Mollitor edited comment on HDFS-13157 at 8/29/19 6:26 PM:
----------------------------------------------------------------

I need to validate that blocks are actually being deleted in order, but it's 
likely.

Note to self:

Consider doing the shuffle here:

[https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L729-L730]


However, it's probably not so simple.  May have to shuffle on queue INSERT or 
something else less straight-forward.  It is the case that the invalidated 
blocks are reported to the DataNodes in batches.  If the queue is loaded up by 
iterating over each volume ID, then from the perspective of the volume ID, the 
queue will look something like:

<1,1,1,2,2,2,3,3,3>

(1,1,1),(2,2,2),(3,3,3)

In this scenario, if the blocks are sent to the DataNode in batches of 3, 
shuffling each individual batch will not yield the required result.

 
Just another thought, it may also be possible to re-implement to store a 
Map-of-Queues, one entry (queue) for each volumeID.  This would allow the list 
of invalidated blocks to be generated by round-robin pulling from each queue in 
order to distribute the load:

(1,2,3),(1,2,3),(1,2,3)


was (Author: belugabehr):
I need to validate that blocks are actually being deleted in order, but it's 
likely.

 

Note to self:

Consider doing the shuffle here:

https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L729-L730

> Do Not Remove Blocks Sequentially During Decommission 
> ------------------------------------------------------
>
>                 Key: HDFS-13157
>                 URL: https://issues.apache.org/jira/browse/HDFS-13157
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>
> From what I understand of [DataNode 
> decommissioning|https://github.com/apache/hadoop/blob/42a1c98597e6dba2e371510a6b2b6b1fb94e4090/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java]
>  it appears that all the blocks are scheduled for removal _in order._. I'm 
> not 100% sure what the ordering is exactly, but I think it loops through each 
> data volume and schedules each block to be replicated elsewhere. The net 
> affect is that during a decommission, all of the DataNode transfer threads 
> slam on a single volume until it is cleaned out. At which point, they all 
> slam on the next volume, etc.
> Please randomize the block list so that there is a more even distribution 
> across all volumes when decommissioning a node.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to