[
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068956#comment-15068956
]
Anu Engineer commented on HDFS-1312:
------------------------------------
[~eddyxu] Very pertinent questions. I will probably steal these questions and
add them to the diskbalancer documentation.
bq. Will disk balancer be a daemon or a CLI tool, that waits the process to
finishes?
It is a combination of both. Let me explain in greater detail. We rely on
datanodes to do actual copy, and datanodes will run a background thread - if
there is a disk balancing job to be run. The disk balancer job is a list of
statements which contain a source volume, destination volume and number of
bytes - in the code called a MoveStep
{{org.apache.hadoop.hdfs.server.diskbalancer.planner.MoveStep}}. The CLI will
read the current state of the cluster and compute a set of MoveSteps, which is
called a Plan. The advantage of this approach is that it allows the
administrator to review the plan if needed before executing it against the
datanode. This plan can also be persisted to a file if needed or just
submitted to a DataNode via submitDiskbalancerPlan RPC - HDFS-9588.
The datanode takes this plan and executes these moves in the background, so
there is no new daemon, and the cli tool is really used in the planning and
generation of data movement for each datanode. This is quite similar to the
balancer, but instead of sending one RPC at a time we submit all of them
together to the datanode - since our moves are pretty much self contained
within a datanode. To sum up, we have a CLI tool that will submit a plan but
does not wait for plan to finish executing and the datanode will do the moves
itself.
bq. Where is the Planner executed? A DiskBalancer daemon / CLI tool or NN?
There is a background thread just like {{DirectoryScanner}} that executes the
plan.
bq. When copying a replica from one volume to another, how to prevent it to
have concurrent issues with the DirectoryScanner in background?
Great question, this is one of the motivators for plan executor to become a
thread inside datanode, so that we don't get into issues with concurrent access
with {{DirectoryScanner}}. We take the same locks as DirectoryScanner when we
move the blocks, Also we don't have any new code for these moves, we rely on
the existing mover code path inside the datanode to achieve the actual move.
bq. Is there a limitation of how many such disk balancer jobs can run in the
cluster?
Yes, one per datanode, if submit Rpc is called when a job is executing we will
reject the new submission. Please look at HDFS-9588 ,
{{ClientDatanodeProtocol.proto#SubmitDiskBalancerPlanResponseProto#submitResults}}
enum to see the set of errors that we return. One of them is
PLAN_ALREADY_IN_PROGRESS which is the error you would see if you tried to
submit another job to a data node that is already executing a job.
bq. Could other job queries the status of a running job?
Yes, I will post a patch soon which will support the QueryPlan RPC which will
return the current status of an executing or last executed plan.
> Re-balance disks within a Datanode
> ----------------------------------
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode
> Reporter: Travis Crawford
> Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations
> where certain disks are full while others are significantly less used. Users
> at many different sites have experienced this issue, and HDFS administrators
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode.
> In write-heavy environments this will still make use of all spindles,
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is
> not needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)