[
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951867#comment-13951867
]
Yu Li commented on HDFS-6010:
-----------------------------
Thanks for the review and comments Tsz.
{quote}
I think "-datanodes" may be a better name than "-servers"...How about adding a
new conf property, say dfs.balancer.selectedDatanodes?
{quote}
IMHO, by making it an option in CLI, user could dynamically choose which nodes
to balance among, while property is static. In our use case, the admin might
balance groupA and groupB separately, and an option in CLI would make it
easier, right?
Agree to rename the option as "-datanodes" if we decided to still use option in
CLI.
{quote}
How about moving it to the balancer package and renaming it to BalancerUtil?
{quote}
Agree to move it to balancer package. About the name, since currently it's only
for validating whether a given string matches a live datanode, it seems to me
the name "BalancerUtil" is too big. :-)
{quote}
a balancer may run for a long time and some datanodes could be down. I think we
should not throw exceptions. Perhaps, printing a warning is good enough
{quote}
It's true tat some datanodes could be down, but I'd like to discuss more about
this scenario. Assuming groupA has 3 nodes and node #1 is down. When admin
issue command like "-datanodes 1,2,3", he means to make data distribution got
balanced across the 3 nodes. If we only print warnings, then it will balance
data between node #2 and #3 firstly, then after node #1 is back, the admin has
to do another round of balancing. Since each balance would add read lock to
involved blocks and cause disk/network IO, in our product env we would prefer
to fail the first trial and wait until all datanodes back. So I'd like to ask
for a second thought on whether to throw exception or print warning here.
{quote}
The new code could be moved to a static method (in BalancerUtil) so that it is
earlier to read.
{quote}
Agree, will refine the code no matter whether we need to change from throwing
exception to printing warning
{quote}
I have not yet checked NodeStringValidator and the new tests in details
{quote}
No problem, will wait for your comments and update the patch in one go, along
with all changes required after above discussion.
> Make balancer able to balance data among specified servers
> ----------------------------------------------------------
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: balancer
> Affects Versions: 2.3.0
> Reporter: Yu Li
> Assignee: Yu Li
> Priority: Minor
> Labels: balancer
> Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in
> some particular case, we would need to balance data only among specified
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.
--
This message was sent by Atlassian JIRA
(v6.2#6252)