[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

Anu Engineer (JIRA) Fri, 15 Jan 2016 12:00:10 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102381#comment-15102381
 ]


Anu Engineer commented on HDFS-1312:
------------------------------------

*Notes from the call on Jan,14th 2016*

Attendees: Andrew Wang, Lei Xu, Colin McCabe, Chris Trezzo, Ming Ma, Arpit 
Agarwal, Jitendra Pandey, Jing Zhao, Mingliang Liu , Xiaobing Zhou, Anu 
Engineer and
others who dialed in (I could only see phone numbers not names, my apologies to 
people I am missing)

We discussed the goals of HDFS-1312. Andrew Wang mentioned that HDFS-1804 is 
used by many customers and it is safe and been used in production for a while. 
Jitendra pointed out that we still have many customers who are not using 
HDFS-1804. so he suggested that we focus the discussion on HDFS-1312. We 
explored the pros and cons of having the planner completely inside the datanode 
and various other  user scenarios. As a team we wanted to make sure that all 
major scenarios are identified and covered in this review.

Ming Ma raised an interesting question, which we decided to address - He wanted 
to find out if running diskbalancer has any quantifiable performance effect. 
Anu mentioned that since we have bandwidth control, Admins should be able to 
control it. However, any disk I/O has a cost and we decided to do some 
performance measurement of disk balancer.

Andrew Wang raised the question of performance counters and how external tools 
like Cloudera Manager or Ambari would use disk balancer ? He also explored how 
we will be able to integrate this tool with other Management tools. We agreed 
that we will have a set of performance counters exposed via datanode JMX. We 
also discussed design trade-offs of doing disk balancer inside the datanode vs. 
outside. We reviewed lots of administrative scenarios and concluded that this 
tool would be able to address them. We also concluded that tool does not do any 
cluster-wide planning and all data movement in confined to datanode.

Colin McCabe brought up a set of interesting questions. He made us think 
through the scenario of data changing in the datanodes while disk balancer is 
operational, the impact of future disks with shingled magnetic recording and 
large disk sizes. He was wondering how long we would take to balance a datanode 
if it is filled with 6 TB or even 20 TB drives. The conclusion was that if you 
had large slow disks and lots of data in a node, it would take proportionally 
more time. For the question of data changing in the datanodes, disk balancer 
would support a tolerance value, or good enough value for balancing. That is, 
an administrator can specify that getting 10% close to the expected data 
distribution is good enough. We also discussed a scenario called "Hot Remove”, 
just like hot swap, small cluster owners might find it useful to move all data 
out of hard disk before removing a disk, say to upgrade to a larger size. 

Ming ma pointed out that for them it is easier and simpler to decommission a 
node. if you have large number of nodes, relying on network is more efficient 
than micro-managing a datanode. We agreed to that, but for small cluster owners 
(say less than 5 or 10 nodes), it might make sense to support the ability to 
move data out of disk. Anu pointed out that disk balancer design does 
accommodate that capability even though it is not the primary goal of the tool.

Ming Ma also brought up how twitter runs balancer tool today, it is always 
being run against Twitter clusters. We discussed if having that balancer as a 
part of namenode makes sense, but concluded that it was out of scope for 
HDFS-1312. Andrew mentioned that is the right thing to do in the long run. We 
also discussed if disk balancer should automatically trigger instead of being 
an administrator driven task and we were worried that it would trigger and 
incur I/O when higher priority compute jobs were running in the cluster, hence 
we decided we are better off letting an admin decide when it is good time to 
run the disk balancer.

At the end of review Andrew asked if we can finish this work by end of next 
month and offered help to make sure that this feature is done sooner.

*Action Items:*  
* Analyze performance impact of disk balancer.
* Add a set of performance counters exposed via datanode JMX.

Please feel free to comment / correct these notes if I have missed anything. 
Thank you all for calling in and for having such a great and productive 
discussion about HDFS-1312.






> Re-balance disks within a Datanode
> ----------------------------------
>
>                 Key: HDFS-1312
>                 URL: https://issues.apache.org/jira/browse/HDFS-1312
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode
>            Reporter: Travis Crawford
>            Assignee: Anu Engineer
>         Attachments: Architecture_and_testplan.pdf, disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

Reply via email to