[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068956#comment-15068956
 ] 

Anu Engineer commented on HDFS-1312:
------------------------------------

[~eddyxu] Very pertinent questions. I will probably steal these questions and 
add them to the diskbalancer documentation.

bq. Will disk balancer be a daemon or a CLI tool, that waits the process to 
finishes? 

It is a combination of both. Let me explain in greater detail. We rely on 
datanodes to do actual copy, and datanodes will run a background thread - if 
there is a disk balancing job to be run. The disk balancer job is a list of 
statements which contain a source volume, destination volume and number of 
bytes - in the code called a MoveStep 
{{org.apache.hadoop.hdfs.server.diskbalancer.planner.MoveStep}}. The CLI will 
read the current state of the cluster and compute a set of MoveSteps, which is 
called a Plan. The advantage of this approach is that it allows the 
administrator to review the plan if needed before executing it against the 
datanode. This  plan can also be persisted to a file if needed or just 
submitted to a DataNode via submitDiskbalancerPlan RPC - HDFS-9588. 

The datanode takes this plan and executes these moves in the background, so 
there is no new daemon, and the cli tool is really used in the planning and 
generation of data movement for each datanode. This is quite similar to the 
balancer, but instead of sending one RPC at a time we submit all of them 
together to the datanode -  since our moves are pretty much self contained 
within a datanode. To sum up, we have a CLI tool that will submit a plan but 
does not wait for plan to finish executing and the datanode will do the moves 
itself. 

bq. Where is the Planner executed? A DiskBalancer daemon / CLI tool or NN?
There is a background thread just like {{DirectoryScanner}} that executes the 
plan.

bq. When copying a replica from one volume to another, how to prevent it to 
have concurrent issues with the DirectoryScanner in background?

Great question, this is one of the motivators for plan executor to become a 
thread inside datanode, so that we don't get into issues with concurrent access 
with {{DirectoryScanner}}. We take the same locks as DirectoryScanner when we 
move the blocks, Also we don't have any new code for these moves, we rely on 
the existing mover code path inside the datanode to achieve the actual move.

bq. Is there a limitation of how many such disk balancer jobs can run in the 
cluster? 
Yes, one per datanode, if submit Rpc is called when a job is executing we will 
reject the new submission. Please look at HDFS-9588 , 
{{ClientDatanodeProtocol.proto#SubmitDiskBalancerPlanResponseProto#submitResults}}
 enum to see the set of errors that we return. One of them is 
PLAN_ALREADY_IN_PROGRESS which is the error you would see if you tried to 
submit another job to a data node that is already executing a job.

bq. Could other job queries the status of a running job?

Yes, I will post a patch soon which will support the QueryPlan RPC which will 
return the current status of an executing or last executed plan.








> Re-balance disks within a Datanode
> ----------------------------------
>
>                 Key: HDFS-1312
>                 URL: https://issues.apache.org/jira/browse/HDFS-1312
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode
>            Reporter: Travis Crawford
>            Assignee: Anu Engineer
>         Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to