[jira] [Commented] (HDFS-3495) Update Balancer to support new NetworkTopology with NodeGroup

Tsz Wo (Nicholas), SZE (JIRA) Wed, 21 Nov 2012 20:41:04 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502579#comment-13502579
 ]


Tsz Wo (Nicholas), SZE commented on HDFS-3495:
----------------------------------------------

Thank Junping for the update.  Looked at the patch.  You did a great job on 
fixing/adding comments for the existing code

- It seems that we are not going to subclass Balancer.  If it is the case, we 
should not change the fields/methods to protected.

- There are a few places initializing a NetworkTopology instance from conf.  
How about adding a new getInstance(conf) method to NetworkTopology so that 
Balancer, DatanodeManager and some tests could call it?
{code}
+    cluster = ReflectionUtils.newInstance(
+        conf.getClass(
+            CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY,
+            NetworkTopology.class, NetworkTopology.class), conf)
{code}

- The one line method areDataNodesOnSameNodeGroup(..) seems unnecessary.  How 
about using cluster.isOnSameNodeGroup(..) directly?

- If we have an addTo(..) method in PendingBlockMove, chooseProxySource() can 
be simplified as below:
{code}
    private boolean addTo(BalancerDatanode bdn) {
      if (bdn.addPendingBlock(this)) {
        proxySource = bdn;
        return true;
      }
      return false;
    }

    /* Now we find out source, target, and block, we need to find a proxy
     * 
     * @return true if a proxy is found; otherwise false
     */
    private boolean chooseProxySource() {
      final DatanodeInfo targetDn = target.getDatanode();
      boolean found = false;
      for (BalancerDatanode loc : block.getLocations()) {
        // check if there is replica which is on the same rack with the target
        if (cluster.isOnSameRack(loc.getDatanode(), targetDn) && addTo(loc)) {
          // if cluster is not nodegroup aware or the proxy is on the same 
          // nodegroup with target, then we already find the nearest proxy
          if (!cluster.isNodeGroupAware() 
              || cluster.isOnSameNodeGroup(loc.getDatanode(), targetDn)) {
            return true;
          }
          found = true;
        }

        if (!found) {
          // find out a non-busy replica out of rack of target
          found = addTo(loc);
        }
      }
      return found;
    }
{code}
The addTo method could also be used in Source.chooseNextBlockToMove().
{code}
         PendingBlockMove pendingBlock = new PendingBlockMove();
-        if ( target.addPendingBlock(pendingBlock) ) { 
+        if (pendingBlock.addTo(target)) { 
           // target is not busy, so do a tentative block allocation
-          pendingBlock.source = this;
           pendingBlock.target = target;
{code}

- Let's combine chooseNodes() and chooseNodesForCustomFaultDomains()?

- chooseTargetOnSameNodeGroup and chooseSourceOnSameNodeGroup look the same.  
Similarly, chooseTargetsOnSameNodeGroup and chooseSourcesOnSameNodeGroup looks 
the same.  I spent sometime to combine them.  See if you think it is good.
{code}
  /* Decide all <source, target> pairs where source and target are 
   * on the same NodeGroup
   */
  private void chooseNodesOnSameNodeGroup() {

    /* first step: match each overUtilized datanode (source) to
     * one or more underUtilized datanodes within same NodeGroup(targets).
     */
    chooseOnSameNodeGroup(overUtilizedDatanodes, underUtilizedDatanodes);

    /* match each remaining overutilized datanode (source) to below average 
     * utilized datanodes within the same NodeGroup(targets).
     * Note only overutilized datanodes that haven't had that max bytes to move
     * satisfied in step 1 are selected
     */
    chooseOnSameNodeGroup(overUtilizedDatanodes, belowAvgUtilizedDatanodes);

    /* match each remaining underutilized datanode to above average utilized 
     * datanodes within the same NodeGroup.
     * Note only underutilized datanodes that have not had that max bytes to
     * move satisfied in step 1 are selected.
     */
    chooseOnSameNodeGroup(underUtilizedDatanodes, aboveAvgUtilizedDatanodes);
  }

  private <T extends BalancerDatanode> T chooseCandidateOnSameNodeGroup(
      BalancerDatanode dn, Iterator<T> candidates) {
    if (dn.isMoveQuotaFull()) {
      for(; candidates.hasNext(); ) {
        final T c = candidates.next();
        if (!c.isMoveQuotaFull()) {
          candidates.remove();
          continue;
        }
        if (cluster.isOnSameNodeGroup(dn.getDatanode(), c.getDatanode())) {
          return c;
        }
      }
    }
    return null;
  }
  
  private void selectCandidateOnSameNodeGroup(
      Source source, BalancerDatanode target) {
    long size = Math.min(source.availableSizeToMove(), 
target.availableSizeToMove());
    NodeTask nodeTask = new NodeTask(target, size);
    source.addNodeTask(nodeTask);
    target.incScheduledSize(nodeTask.getSize());
    sources.add(source);
    targets.add(target);
    LOG.info("Decided to move "+StringUtils.byteDesc(size)+" bytes from "
        +source.datanode.getName() + " to " + target.datanode.getName());
  }

  private <T extends BalancerDatanode> boolean chooseOnSameNodeGroup(
      BalancerDatanode dn, Iterator<T> candidates) {
    final T chosen = chooseCandidateOnSameNodeGroup(dn, candidates);
    if (chosen == null) {
      return false;
    }
    if (dn instanceof Source) {
      selectCandidateOnSameNodeGroup((Source)dn, chosen);
    } else {
      selectCandidateOnSameNodeGroup((Source)chosen, dn);
    }
    if (!chosen.isMoveQuotaFull()) {
      candidates.remove();
    }
    return true;
  }

  private <D extends BalancerDatanode, C extends BalancerDatanode>
      void chooseOnSameNodeGroup(Collection<D> datanodes, Collection<C> 
candidates) {
    for (Iterator<D> i = datanodes.iterator(); i.hasNext();) {
      final D source = i.next();
      for(; chooseOnSameNodeGroup(source, candidates.iterator()); );
      if (!source.isMoveQuotaFull()) {
        i.remove();
      }
    }
  }
{code}

                
> Update Balancer to support new NetworkTopology with NodeGroup
> -------------------------------------------------------------
>
>                 Key: HDFS-3495
>                 URL: https://issues.apache.org/jira/browse/HDFS-3495
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer
>    Affects Versions: 1.1.0, 2.0.2-alpha
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: HADOOP-8473-Balancer-NodeGroup-aware.patch, 
> HDFS-3495-v2.patch, HDFS-3495-v3.patch
>
>
> Since the Balancer is a Hadoop Tool, it was updated to be directly aware of 
> four-layer hierarchy instead of creating an alternative Balancer 
> implementation. To accommodate extensibility, a new protected method, 
> doChooseNodesForCustomFaultDomain is now called from the existing chooseNodes 
> method so that a subclass of the Balancer could customize the balancer 
> algotirhm for other failure and locality topologies. An alternative option is 
> to encapsulate the algorithm used for the four-layer hierarchy into a 
> collaborating strategy class.
> The key changes introduced to support a four-layer hierarchy were to override 
> the algorithm of choosing <source, target> pairs for balancing. Unit tests 
> were created to test the new algorithm.
> The algorithm now makes sure to choose the target and source node on the same 
> node group for balancing as the first priority. Then the overall balancing 
> policy is: first doing balancing between nodes within the same nodegroup then 
> the same rack and off rack at last. Also, we need to check no duplicated 
> replicas live in the same node group after balancing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3495) Update Balancer to support new NetworkTopology with NodeGroup

Reply via email to