Rebalance data blocks when new data nodes added or data nodes become full
-------------------------------------------------------------------------

                 Key: HADOOP-1652
                 URL: https://issues.apache.org/jira/browse/HADOOP-1652
             Project: Hadoop
          Issue Type: New Feature
          Components: dfs
    Affects Versions: 0.13.0
            Reporter: Hairong Kuang
            Assignee: Hairong Kuang
             Fix For: 0.15.0


When a new data node joins hdfs cluster, it does not hold much data. So any map 
task assigned to the machine most likely does not read local data, thus 
increasing the use of network bandwidth. On the other hand, when some data 
nodes become full, new data blocks are placed on only non-full data nodes, thus 
reducing their read parallelism. 

This jira aims to find an approach to redistribute data blocks when imbalance 
occurs in the cluster.  An solution should meet the following requirements:
1. It maintains data availablility guranteens in the sense that rebalancing 
does not reduce the number of replicas that a block has or the number of racks 
that the block resides.
2. An adminstrator should be able to invoke and interrupt rebalancing from a 
command line.
3. Rebalancing should be throttled so that rebalancing does not cause a 
namenode to be too busy to serve any incoming request or saturate the network.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to