[ 
https://issues.apache.org/jira/browse/HDFS-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1480:
-----------------------------------------

          Component/s: name-node
          Description: 
It appears that all replicas of a block can end up in the same rack. The 
likelihood of such replicas seems to be directly related to decommissioning of 
nodes. 

Post rolling OS upgrade (decommission 3-10% of nodes, re-install etc, add them 
back) of a running cluster, all replicas of about 0.16% of blocks ended up in 
the same rack.

Hadoop Namenode UI etc doesn't seem to know about such incorrectly replicated 
blocks. "hadoop fsck .." does report that the blocks must be replicated on 
additional racks.

Looking at ReplicationTargetChooser.java, following seem suspect:

snippet-01:
{code}
    int maxNodesPerRack =
      (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;
{code}

snippet-02:
{code}
    if (counter>maxTargetPerLoc) {
      logr.debug("Node "+NodeBase.getPath(node)+
                " is not chosen because the rack has too many chosen nodes");
      return false;
    }
{code}

snippet-03:
{code}
      default:
        chooseRandom(numOfReplicas, NodeBase.ROOT, excludedNodes,
                     blocksize, maxNodesPerRack, results);
      }
{code}

  was:
It appears that all replicas of a block can end up in the same rack. The 
likelihood of such replicas seems to be directly related to decommissioning of 
nodes. 

Post rolling OS upgrade (decommission 3-10% of nodes, re-install etc, add them 
back) of a running cluster, all replicas of about 0.16% of blocks ended up in 
the same rack.

Hadoop Namenode UI etc doesn't seem to know about such incorrectly replicated 
blocks. "hadoop fsck .." does report that the blocks must be replicated on 
additional racks.

Looking at ReplicationTargetChooser.java, following seem suspect:

snippet-01:

"""
    int maxNodesPerRack =
      (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;

"""

snippet-02:

"""
    if (counter>maxTargetPerLoc) {
      logr.debug("Node "+NodeBase.getPath(node)+
                " is not chosen because the rack has too many chosen nodes");
      return false;
    }
"""

snippet-03:

"""
      default:
        chooseRandom(numOfReplicas, NodeBase.ROOT, excludedNodes,
                     blocksize, maxNodesPerRack, results);
      }
"""

--


             Priority: Major  (was: Minor)
    Affects Version/s: 0.20.2

Formatted the description.

> All replicas for a block end up in same rack
> --------------------------------------------
>
>                 Key: HDFS-1480
>                 URL: https://issues.apache.org/jira/browse/HDFS-1480
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.2
>            Reporter: T Meyarivan
>
> It appears that all replicas of a block can end up in the same rack. The 
> likelihood of such replicas seems to be directly related to decommissioning 
> of nodes. 
> Post rolling OS upgrade (decommission 3-10% of nodes, re-install etc, add 
> them back) of a running cluster, all replicas of about 0.16% of blocks ended 
> up in the same rack.
> Hadoop Namenode UI etc doesn't seem to know about such incorrectly replicated 
> blocks. "hadoop fsck .." does report that the blocks must be replicated on 
> additional racks.
> Looking at ReplicationTargetChooser.java, following seem suspect:
> snippet-01:
> {code}
>     int maxNodesPerRack =
>       (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;
> {code}
> snippet-02:
> {code}
>     if (counter>maxTargetPerLoc) {
>       logr.debug("Node "+NodeBase.getPath(node)+
>                 " is not chosen because the rack has too many chosen nodes");
>       return false;
>     }
> {code}
> snippet-03:
> {code}
>       default:
>         chooseRandom(numOfReplicas, NodeBase.ROOT, excludedNodes,
>                      blocksize, maxNodesPerRack, results);
>       }
> {code}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to