[
https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641614#action_12641614
]
Tsz Wo (Nicholas), SZE commented on HADOOP-4483:
------------------------------------------------
Good catch on the bug.
- Removing the elements form a collection one by one could be expensive. Also,
we have max >= n in most of the cases. How about using the existing codes
(i.e. blocks.clear() instead of e.remove()) when max >= n?
- Could you also remove the white space changes, tabs and the trailing spaces?
This will keep the codes have the same style.
> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
> Key: HADOOP-4483
> URL: https://issues.apache.org/jira/browse/HADOOP-4483
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.1
> Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
> Reporter: Ahad Rana
> Priority: Critical
> Fix For: 0.18.2
>
> Attachments: patch.HADOOP-4483
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed
> in maxblocks parameter. In its current form it passed in an array sized to
> min(maxblocks,blocks.size()) into the Collections.toArray method. As the
> javadoc for Collections.toArray indicates, the toArray method may discard the
> passed in array (and allocate a new array) if the number of elements returned
> by the iterator exceeds the size of the passed in array. As a result, the
> flawed implementation of this method would return all the invalid blocks for
> a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE
> command to the DataNode with an excessively large number of blocks. This
> INVALIDATE command, in turn, could potentially take a very long time to
> process at the DataNode, and since DatanodeCommand(s) are processed in
> between heartbeats at the DataNode, this would trigger the NameNode to
> consider the DataNode to be offline / unresponsive (due to a lack of
> heartbeats).
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file
> deletions after certain stages of our map-reduce pipeline. These deletes
> would make certain DataNode(s) unresponsive, and thus impact the cluster's
> capability to properly balance file-system reads / writes across the whole
> available cluster. This problem only surfaced once we migrated from our 16.2
> deployment to the current 18.1 release.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.