+1
The code looks good in general.  It is great that there are a lot of tests and 
documentation.  Some minor comments which can be addressed after merge:- There 
are a few TODOs in the code.- Tried the  help command "hdfs diskbalancer -help 
plan".  There is a typo "wetolerate" in --thresholdPercentage.  Also, we should 
mention the unit for --bandwidth.- We should avoid using the same class name 
such as DiskBalancer, which is defined in both the datanode and tools packages. 
 It may be better to call it DiskBalancerCli for the one in tools.- I still 
think that it is better to use weighted mean and weighted variance in the 
calculation.
Thanks.Tsz-Wo
 

    On Thursday, June 16, 2016 8:38 AM, Anu Engineer 
<aengin...@hortonworks.com> wrote:
 
 

  Hi All,

I would like to propose a merge vote for HDFS-1312 (Disk balancer) branch to 
trunk. This branch creates a new tool that allows balancing of data on a 
datanode.

The voting commences now and will run for 7 days till Jun/22/2016 5:00 PM PST.

This tool distributes data evenly between the disks of same type on a datanode.
This is useful if a disk has been replaced or if some disks are out of space 
compared to rest of the disks.

The current set of commands supported are:

1. Plan - Allows user to create a plan and review it. The plan describes how 
the data will be moved in the data node.

2. Execute - Allows execution of a plan against a datanode.

3. Query – Queries the status of disk balancer execution.

4. Cancel - cancels a running disk balancer plan.

5. Report – Reports the current state of data distribution on a node.


·        The original proposal that captures the rationale and possible 
solution is here.  [ 
https://issues.apache.org/jira/secure/attachment/12755226/disk-balancer-proposal.pdf
 ]

·        The updated architecture and test plan document is here. [ 
https://issues.apache.org/jira/secure/attachment/12810720/Architecture_and_test_update.pdf
 ]

·        The merge patch that is a diff against trunk is posted here. [ 
https://issues.apache.org/jira/secure/attachment/12810943/HDFS-1312.001.patch ]

·        The user documentation which will be part of apache is posted here. [ 
https://issues.apache.org/jira/secure/attachment/12805976/HDFS-9547-HDFS-1312.002.patch
 ]


HDFS-1312 has a set of sub-tasks and they are ordered in the same sequence as 
they were committed to HDFS-1312. Hopefully this will make it easy to code 
review this branch.

There are a set of commands which we would like to do later, including 
discovering which datanodes in the cluster would benefit by running disk 
balancer.
Appropriate JIRAs for these future work items are filed under HDFS-1312.

Disk Balancer is made possible due to the work of many community members 
including Arpit Agarwal, Vinayakumar B, Mingliang Liu, Tsz Wo Nicholas Sze,
Lei (Eddy) Xu and Xiaobing Zhou. I would like to thank them all for the effort 
and support.

Thanks
Anu


 
  

Reply via email to