[ 
https://issues.apache.org/jira/browse/HDFS-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037290#comment-15037290
 ] 

Anu Engineer commented on HDFS-9469:
------------------------------------

[~szetszwo], Thanks for the detailed review. Here are my thoughts on your 
comments. I will update the review soon with a new patch.

bq. I think we need to change the data model to use mean and variance before 
adding planner. Otherwise, it is harder to change later.
Will do. Right now it is masked behind isBalancingNeeded. I will file a Jira 
and fix that.

bq. computePlan: top is "The total number of nodes to process". Then what is 
nodesToProcess.size()? Is it supposed top >= nodesToProcess.size()?
I see the comments are confusing. Top is the number of nodes that gets send 
down to the program by the user, and NodesToProcess is the list we discover 
from the cluster. I agree that top is badly named.

bq. computePoolSize return 0 if nodeCount is 9000. It should not " % 100 " at 
the end.
Thanks for catching that , fixed.

bq. It logs a messge per node. Is it needed?
I can see that it is noisy since it is going the be same planner most of the 
time.

bq. Is the planner supposed to be fixed for a single run?
Depends on how we define a single run. if you are asking is the planner fixed 
for the duration of a planner being run, yes. But on the other hand nothing 
prevents the user from creating a plan and executing it later. That means 
technically it is possible to have different plans and then execute these plans.

bq. What other planners are we going to support?
The idea behind the planner is that it will serve as a generic disk layout 
tool. At some point in time it could merge with mover, for example, and that 
would need a different kind of planner.

bq. It should throw an exception instead of returning null at the end.
 Will do.

bq. We should use LOG.error instead of LOG.fatal below.
Will do.

bq. The GreedyPlanner is an algorithm. The input is a DiskBalancerDataNode. So 
node should be a parameter of plan() but not a field.
Will do.

bq. Use StringUtils.TraditionalBinaryPrefix.long2String(..) instead of adding 
getSizeString.
Thanks for the pointer, did not know about that.

bq. I did not continue reviewing the computation since it requires the data 
model change.
if you are referring to weighted mean and variance , they are technically 
hidden behind the function of isBalancingNeeded. I will certainly update the 
algorithm as you have proposed.  Just wanted to make sure that you are 
commenting about the same issue, if it is indeed the same issue, I will update 
this patch with the changes related to this one and update the data models in 
another Jira since we will need some tests to makes sure that code is working 
correctly with the weighted mean and variance changes.








> DiskBalancer : Add Planner 
> ---------------------------
>
>                 Key: HDFS-9469
>                 URL: https://issues.apache.org/jira/browse/HDFS-9469
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: balancer & mover
>    Affects Versions: 2.8.0
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>         Attachments: HDFS-9469-HDFS-1312.001.patch, 
> HDFS-9469-HDFS-1312.002.patch
>
>
> Disk Balancer reads the cluster data and then creates a plan for the data 
> moves based on the snap-shot of the data read from the nodes. This plan is 
> later submitted to data nodes for execution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to