[ 
https://issues.apache.org/jira/browse/HADOOP-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669800#action_12669800
 ] 

Konstantin Shvachko commented on HADOOP-5124:
---------------------------------------------

# {{computeInvalidateWork()}}
## You probably want to use {{Math.min()}} in computing the value of 
{{nodesToProcess}}
## I would rather go with 
{{ArrayList<String> keyArray = new 
ArrayList<String>(recentInvalidateSets.keySet());}} 
than {{String[] keyArray}}. You will be able to use {{Collections.swap()}} 
instead of implementing it yourself.
Ideally it would be better of course to just get a random element from the 
TreeMap and put it into the array list.
# {{invalidateWorkForOneNode()}}
{code}
    if(it.hasNext())
      recentInvalidateSets.put(firstNodeId, invalidateSet);
{code}
Is a no op in your case, because {{recentInvalidateSets}} already contains 
{{firstNodeId}} with exactly {{invalidateSet}} as it was modified before in the 
loop.
The original variant of this code
{code}
    if(!it.hasNext())
      recentInvalidateSets.remove(nodeId);
{code}
makes more sense since we remove the entire node if it does not have invalid 
blocks anymore.
# Could you please run some tests showing how much of optimization we can get 
with the randomization of data-node selection.

> A few optimizations to FsNamesystem#RecentInvalidateSets
> --------------------------------------------------------
>
>                 Key: HADOOP-5124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5124
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.21.0
>
>         Attachments: optimizeInvalidate.patch, optimizeInvalidate1.patch
>
>
> This jira proposes a few optimization to FsNamesystem#RecentInvalidateSets:
> 1. when removing all replicas of a block, it does not traverse all nodes in 
> the map. Instead it traverse only the nodes that the block is located.
> 2. When dispatching blocks to datanodes in ReplicationMonitor. It randomly 
> chooses a predefined number of datanodes and dispatches blocks to those 
> datanodes. This strategy provides fairness to all datanodes. The current 
> strategy always starts from the first datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to