[
https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669800#action_12669800
]
Konstantin Shvachko commented on HADOOP-5124:
---------------------------------------------
# {{computeInvalidateWork()}}
## You probably want to use {{Math.min()}} in computing the value of
{{nodesToProcess}}
## I would rather go with
{{ArrayList<String> keyArray = new
ArrayList<String>(recentInvalidateSets.keySet());}}
than {{String[] keyArray}}. You will be able to use {{Collections.swap()}}
instead of implementing it yourself.
Ideally it would be better of course to just get a random element from the
TreeMap and put it into the array list.
# {{invalidateWorkForOneNode()}}
{code}
if(it.hasNext())
recentInvalidateSets.put(firstNodeId, invalidateSet);
{code}
Is a no op in your case, because {{recentInvalidateSets}} already contains
{{firstNodeId}} with exactly {{invalidateSet}} as it was modified before in the
loop.
The original variant of this code
{code}
if(!it.hasNext())
recentInvalidateSets.remove(nodeId);
{code}
makes more sense since we remove the entire node if it does not have invalid
blocks anymore.
# Could you please run some tests showing how much of optimization we can get
with the randomization of data-node selection.
> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>
> Key: HADOOP-5124
> URL: https://issues.apache.org/jira/browse/HADOOP-5124
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Fix For: 0.21.0
>
> Attachments: optimizeInvalidate.patch, optimizeInvalidate1.patch
>
>
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in
> the map. Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly
> chooses a predefined number of datanodes and dispatches blocks to those
> datanodes. This strategy provides fairness to all datanodes. The current
> strategy always starts from the first datanode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.