[
https://issues.apache.org/jira/browse/HDFS-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502579#comment-13502579
]
Tsz Wo (Nicholas), SZE commented on HDFS-3495:
----------------------------------------------
Thank Junping for the update. Looked at the patch. You did a great job on
fixing/adding comments for the existing code
- It seems that we are not going to subclass Balancer. If it is the case, we
should not change the fields/methods to protected.
- There are a few places initializing a NetworkTopology instance from conf.
How about adding a new getInstance(conf) method to NetworkTopology so that
Balancer, DatanodeManager and some tests could call it?
{code}
+ cluster = ReflectionUtils.newInstance(
+ conf.getClass(
+ CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY,
+ NetworkTopology.class, NetworkTopology.class), conf)
{code}
- The one line method areDataNodesOnSameNodeGroup(..) seems unnecessary. How
about using cluster.isOnSameNodeGroup(..) directly?
- If we have an addTo(..) method in PendingBlockMove, chooseProxySource() can
be simplified as below:
{code}
private boolean addTo(BalancerDatanode bdn) {
if (bdn.addPendingBlock(this)) {
proxySource = bdn;
return true;
}
return false;
}
/* Now we find out source, target, and block, we need to find a proxy
*
* @return true if a proxy is found; otherwise false
*/
private boolean chooseProxySource() {
final DatanodeInfo targetDn = target.getDatanode();
boolean found = false;
for (BalancerDatanode loc : block.getLocations()) {
// check if there is replica which is on the same rack with the target
if (cluster.isOnSameRack(loc.getDatanode(), targetDn) && addTo(loc)) {
// if cluster is not nodegroup aware or the proxy is on the same
// nodegroup with target, then we already find the nearest proxy
if (!cluster.isNodeGroupAware()
|| cluster.isOnSameNodeGroup(loc.getDatanode(), targetDn)) {
return true;
}
found = true;
}
if (!found) {
// find out a non-busy replica out of rack of target
found = addTo(loc);
}
}
return found;
}
{code}
The addTo method could also be used in Source.chooseNextBlockToMove().
{code}
PendingBlockMove pendingBlock = new PendingBlockMove();
- if ( target.addPendingBlock(pendingBlock) ) {
+ if (pendingBlock.addTo(target)) {
// target is not busy, so do a tentative block allocation
- pendingBlock.source = this;
pendingBlock.target = target;
{code}
- Let's combine chooseNodes() and chooseNodesForCustomFaultDomains()?
- chooseTargetOnSameNodeGroup and chooseSourceOnSameNodeGroup look the same.
Similarly, chooseTargetsOnSameNodeGroup and chooseSourcesOnSameNodeGroup looks
the same. I spent sometime to combine them. See if you think it is good.
{code}
/* Decide all <source, target> pairs where source and target are
* on the same NodeGroup
*/
private void chooseNodesOnSameNodeGroup() {
/* first step: match each overUtilized datanode (source) to
* one or more underUtilized datanodes within same NodeGroup(targets).
*/
chooseOnSameNodeGroup(overUtilizedDatanodes, underUtilizedDatanodes);
/* match each remaining overutilized datanode (source) to below average
* utilized datanodes within the same NodeGroup(targets).
* Note only overutilized datanodes that haven't had that max bytes to move
* satisfied in step 1 are selected
*/
chooseOnSameNodeGroup(overUtilizedDatanodes, belowAvgUtilizedDatanodes);
/* match each remaining underutilized datanode to above average utilized
* datanodes within the same NodeGroup.
* Note only underutilized datanodes that have not had that max bytes to
* move satisfied in step 1 are selected.
*/
chooseOnSameNodeGroup(underUtilizedDatanodes, aboveAvgUtilizedDatanodes);
}
private <T extends BalancerDatanode> T chooseCandidateOnSameNodeGroup(
BalancerDatanode dn, Iterator<T> candidates) {
if (dn.isMoveQuotaFull()) {
for(; candidates.hasNext(); ) {
final T c = candidates.next();
if (!c.isMoveQuotaFull()) {
candidates.remove();
continue;
}
if (cluster.isOnSameNodeGroup(dn.getDatanode(), c.getDatanode())) {
return c;
}
}
}
return null;
}
private void selectCandidateOnSameNodeGroup(
Source source, BalancerDatanode target) {
long size = Math.min(source.availableSizeToMove(),
target.availableSizeToMove());
NodeTask nodeTask = new NodeTask(target, size);
source.addNodeTask(nodeTask);
target.incScheduledSize(nodeTask.getSize());
sources.add(source);
targets.add(target);
LOG.info("Decided to move "+StringUtils.byteDesc(size)+" bytes from "
+source.datanode.getName() + " to " + target.datanode.getName());
}
private <T extends BalancerDatanode> boolean chooseOnSameNodeGroup(
BalancerDatanode dn, Iterator<T> candidates) {
final T chosen = chooseCandidateOnSameNodeGroup(dn, candidates);
if (chosen == null) {
return false;
}
if (dn instanceof Source) {
selectCandidateOnSameNodeGroup((Source)dn, chosen);
} else {
selectCandidateOnSameNodeGroup((Source)chosen, dn);
}
if (!chosen.isMoveQuotaFull()) {
candidates.remove();
}
return true;
}
private <D extends BalancerDatanode, C extends BalancerDatanode>
void chooseOnSameNodeGroup(Collection<D> datanodes, Collection<C>
candidates) {
for (Iterator<D> i = datanodes.iterator(); i.hasNext();) {
final D source = i.next();
for(; chooseOnSameNodeGroup(source, candidates.iterator()); );
if (!source.isMoveQuotaFull()) {
i.remove();
}
}
}
{code}
> Update Balancer to support new NetworkTopology with NodeGroup
> -------------------------------------------------------------
>
> Key: HDFS-3495
> URL: https://issues.apache.org/jira/browse/HDFS-3495
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: balancer
> Affects Versions: 1.1.0, 2.0.2-alpha
> Reporter: Junping Du
> Assignee: Junping Du
> Attachments: HADOOP-8473-Balancer-NodeGroup-aware.patch,
> HDFS-3495-v2.patch, HDFS-3495-v3.patch
>
>
> Since the Balancer is a Hadoop Tool, it was updated to be directly aware of
> four-layer hierarchy instead of creating an alternative Balancer
> implementation. To accommodate extensibility, a new protected method,
> doChooseNodesForCustomFaultDomain is now called from the existing chooseNodes
> method so that a subclass of the Balancer could customize the balancer
> algotirhm for other failure and locality topologies. An alternative option is
> to encapsulate the algorithm used for the four-layer hierarchy into a
> collaborating strategy class.
> The key changes introduced to support a four-layer hierarchy were to override
> the algorithm of choosing <source, target> pairs for balancing. Unit tests
> were created to test the new algorithm.
> The algorithm now makes sure to choose the target and source node on the same
> node group for balancing as the first priority. Then the overall balancing
> policy is: first doing balancing between nodes within the same nodegroup then
> the same rack and off rack at last. Also, we need to check no duplicated
> replicas live in the same node group after balancing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira