[ 
https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643605#action_12643605
 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4483:
------------------------------------------------

+1 patch looks good.

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, 
> HADOOP-4483-v3.patch, invalidateBlocksCopy.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed 
> in maxblocks parameter. In its current form it passed in an array sized to 
> min(maxblocks,blocks.size()) into the Collections.toArray method. As the 
> javadoc for Collections.toArray indicates, the toArray method may discard the 
> passed in array (and allocate a new array) if the number of elements returned 
> by the iterator exceeds the size of the passed in array. As a result, the 
> flawed implementation of this method would return all the invalid blocks for 
> a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE 
> command to the DataNode with an excessively large number of blocks. This 
> INVALIDATE command, in turn, could potentially take a very long time to 
> process at the DataNode, and since DatanodeCommand(s) are processed in 
> between heartbeats at the DataNode, this would trigger the NameNode to 
> consider the DataNode to be offline / unresponsive (due to a lack of 
> heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file 
> deletions after certain stages of our map-reduce pipeline. These deletes 
> would make certain DataNode(s) unresponsive, and thus impact the cluster's 
> capability to properly balance file-system reads / writes across the whole 
> available cluster. This problem only surfaced once we migrated from our 16.2 
> deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to