[
https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641630#action_12641630
]
Ahad Rana commented on HADOOP-4483:
-----------------------------------
Re: Removing the elements form a collection one by one could be expensive.
Also, we have max >= n in most of the cases. How about using the existing codes
(i.e. blocks.clear() instead of e.remove()) when max >= n?
I believe that since the underlying container is a tree, even Collection's
internal code needs to use the iterator approach to remove items from the data
structure. Plus, in the standard configuration, where the heartbeat interval is
set to 3 seconds, I believe max blocks <= 100. Better to stick to one code path
in this instance.
Re: Could you also remove the white space changes, tabs and the trailing
spaces? This will keep the codes have the same style.
Sorry, one my editors must to be set to tabs vs. spaces or vice versa. Am I
correct in assuming that the convention for the hadoop codebase is spaces
(instead of tabs) ?
> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
> Key: HADOOP-4483
> URL: https://issues.apache.org/jira/browse/HADOOP-4483
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.1
> Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
> Reporter: Ahad Rana
> Priority: Critical
> Fix For: 0.18.2
>
> Attachments: patch.HADOOP-4483
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed
> in maxblocks parameter. In its current form it passed in an array sized to
> min(maxblocks,blocks.size()) into the Collections.toArray method. As the
> javadoc for Collections.toArray indicates, the toArray method may discard the
> passed in array (and allocate a new array) if the number of elements returned
> by the iterator exceeds the size of the passed in array. As a result, the
> flawed implementation of this method would return all the invalid blocks for
> a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE
> command to the DataNode with an excessively large number of blocks. This
> INVALIDATE command, in turn, could potentially take a very long time to
> process at the DataNode, and since DatanodeCommand(s) are processed in
> between heartbeats at the DataNode, this would trigger the NameNode to
> consider the DataNode to be offline / unresponsive (due to a lack of
> heartbeats).
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file
> deletions after certain stages of our map-reduce pipeline. These deletes
> would make certain DataNode(s) unresponsive, and thus impact the cluster's
> capability to properly balance file-system reads / writes across the whole
> available cluster. This problem only surfaced once we migrated from our 16.2
> deployment to the current 18.1 release.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.