[
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102381#comment-15102381
]
Anu Engineer commented on HDFS-1312:
------------------------------------
*Notes from the call on Jan,14th 2016*
Attendees: Andrew Wang, Lei Xu, Colin McCabe, Chris Trezzo, Ming Ma, Arpit
Agarwal, Jitendra Pandey, Jing Zhao, Mingliang Liu , Xiaobing Zhou, Anu
Engineer and
others who dialed in (I could only see phone numbers not names, my apologies to
people I am missing)
We discussed the goals of HDFS-1312. Andrew Wang mentioned that HDFS-1804 is
used by many customers and it is safe and been used in production for a while.
Jitendra pointed out that we still have many customers who are not using
HDFS-1804. so he suggested that we focus the discussion on HDFS-1312. We
explored the pros and cons of having the planner completely inside the datanode
and various other user scenarios. As a team we wanted to make sure that all
major scenarios are identified and covered in this review.
Ming Ma raised an interesting question, which we decided to address - He wanted
to find out if running diskbalancer has any quantifiable performance effect.
Anu mentioned that since we have bandwidth control, Admins should be able to
control it. However, any disk I/O has a cost and we decided to do some
performance measurement of disk balancer.
Andrew Wang raised the question of performance counters and how external tools
like Cloudera Manager or Ambari would use disk balancer ? He also explored how
we will be able to integrate this tool with other Management tools. We agreed
that we will have a set of performance counters exposed via datanode JMX. We
also discussed design trade-offs of doing disk balancer inside the datanode vs.
outside. We reviewed lots of administrative scenarios and concluded that this
tool would be able to address them. We also concluded that tool does not do any
cluster-wide planning and all data movement in confined to datanode.
Colin McCabe brought up a set of interesting questions. He made us think
through the scenario of data changing in the datanodes while disk balancer is
operational, the impact of future disks with shingled magnetic recording and
large disk sizes. He was wondering how long we would take to balance a datanode
if it is filled with 6 TB or even 20 TB drives. The conclusion was that if you
had large slow disks and lots of data in a node, it would take proportionally
more time. For the question of data changing in the datanodes, disk balancer
would support a tolerance value, or good enough value for balancing. That is,
an administrator can specify that getting 10% close to the expected data
distribution is good enough. We also discussed a scenario called "Hot Removeā,
just like hot swap, small cluster owners might find it useful to move all data
out of hard disk before removing a disk, say to upgrade to a larger size.
Ming ma pointed out that for them it is easier and simpler to decommission a
node. if you have large number of nodes, relying on network is more efficient
than micro-managing a datanode. We agreed to that, but for small cluster owners
(say less than 5 or 10 nodes), it might make sense to support the ability to
move data out of disk. Anu pointed out that disk balancer design does
accommodate that capability even though it is not the primary goal of the tool.
Ming Ma also brought up how twitter runs balancer tool today, it is always
being run against Twitter clusters. We discussed if having that balancer as a
part of namenode makes sense, but concluded that it was out of scope for
HDFS-1312. Andrew mentioned that is the right thing to do in the long run. We
also discussed if disk balancer should automatically trigger instead of being
an administrator driven task and we were worried that it would trigger and
incur I/O when higher priority compute jobs were running in the cluster, hence
we decided we are better off letting an admin decide when it is good time to
run the disk balancer.
At the end of review Andrew asked if we can finish this work by end of next
month and offered help to make sure that this feature is done sooner.
*Action Items:*
* Analyze performance impact of disk balancer.
* Add a set of performance counters exposed via datanode JMX.
Please feel free to comment / correct these notes if I have missed anything.
Thank you all for calling in and for having such a great and productive
discussion about HDFS-1312.
> Re-balance disks within a Datanode
> ----------------------------------
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode
> Reporter: Travis Crawford
> Assignee: Anu Engineer
> Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations
> where certain disks are full while others are significantly less used. Users
> at many different sites have experienced this issue, and HDFS administrators
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode.
> In write-heavy environments this will still make use of all spindles,
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is
> not needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)