[
https://issues.apache.org/jira/browse/HDFS-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502685#comment-13502685
]
Junping Du commented on HDFS-3495:
----------------------------------
Hi Nicholas,
Thanks for your review and your great comments! Please see my reply below:
1.It seems that we are not going to subclass Balancer. If it is the case, we
should not change the fields/methods to protected.
[JP] Sure. I remove all protected method in Balancer. The exception could be on
unit test as it needs a subclass of MiniDFSCluster
(MiniDFSClusterWithNodeGroup) which have different configuration in
initialization.
2. There are a few places initializing a NetworkTopology instance from conf.
How about adding a new getInstance(conf) method to NetworkTopology so that
Balancer, DatanodeManager and some tests could call it?
[JP] Sure. Updated.
3. The one line method areDataNodesOnSameNodeGroup(..) seems unnecessary. How
about using cluster.isOnSameNodeGroup(..) directly?
[JP] Agree. it could be better. Updated.
4.If we have an addTo(..) method in PendingBlockMove, chooseProxySource() can
be simplified as below:
[JP] It is awesome! It makes code much simplified and clear. Thanks!
5. The addTo method could also be used in Source.chooseNextBlockToMove().
[JP] It looks slight different case there. If we use addTo() to replace
original code, then we will lose "pendingBlock.source = this;" and add "
proxySource = target" which is different with original logic?
6.Let's combine chooseNodes() and chooseNodesForCustomFaultDomains()?
[JP] Sure. Two methods are combined.
7. chooseTargetOnSameNodeGroup and chooseSourceOnSameNodeGroup look the same.
Similarly, chooseTargetsOnSameNodeGroup and chooseSourcesOnSameNodeGroup looks
the same. I spent sometime to combine them. See if you think it is good.
[JP] It is super! I put it in new changes and the method of
selectCandidateOnSameNodeGroup can be used by other code in Balancer (I rename
to matchSourceWithTargetToMove as it is more general).
I will update patch soon.
> Update Balancer to support new NetworkTopology with NodeGroup
> -------------------------------------------------------------
>
> Key: HDFS-3495
> URL: https://issues.apache.org/jira/browse/HDFS-3495
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: balancer
> Affects Versions: 1.1.0, 2.0.2-alpha
> Reporter: Junping Du
> Assignee: Junping Du
> Attachments: HADOOP-8473-Balancer-NodeGroup-aware.patch,
> HDFS-3495-v2.patch, HDFS-3495-v3.patch
>
>
> Since the Balancer is a Hadoop Tool, it was updated to be directly aware of
> four-layer hierarchy instead of creating an alternative Balancer
> implementation. To accommodate extensibility, a new protected method,
> doChooseNodesForCustomFaultDomain is now called from the existing chooseNodes
> method so that a subclass of the Balancer could customize the balancer
> algotirhm for other failure and locality topologies. An alternative option is
> to encapsulate the algorithm used for the four-layer hierarchy into a
> collaborating strategy class.
> The key changes introduced to support a four-layer hierarchy were to override
> the algorithm of choosing <source, target> pairs for balancing. Unit tests
> were created to test the new algorithm.
> The algorithm now makes sure to choose the target and source node on the same
> node group for balancing as the first priority. Then the overall balancing
> policy is: first doing balancing between nodes within the same nodegroup then
> the same rack and off rack at last. Also, we need to check no duplicated
> replicas live in the same node group after balancing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira