[
https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951561#comment-13951561
]
Tsz Wo Nicholas Sze commented on HDFS-6010:
-------------------------------------------
The patch is generally good. Some comments:
- I think "-datanodes" may be a better name than "-servers". However, I
actually suggest not adding it as a CLI parameter since, for a large cluster,
it may not be easy to specify all the selected datanodes in CLI. How about
adding a new conf property, say dfs.balancer.selectedDatanodes?
- The new class NodeStringValidator is unlikely to be used outside Balancer.
How about moving it to the balancer package and renaming it to BalancerUtil?
- In initNodes(..), if target == null, it will throw an
IllegalArgumentException. However, a balancer may run for a long time and some
datanodes could be down. I think we should not throw exceptions. Perhaps,
printing a warning is good enough.
-* The new code could be moved to a static method (in BalancerUtil) so that it
is earlier to read.
I have not yet checked NodeStringValidator and the new tests in details.
> Make balancer able to balance data among specified servers
> ----------------------------------------------------------
>
> Key: HDFS-6010
> URL: https://issues.apache.org/jira/browse/HDFS-6010
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: balancer
> Affects Versions: 2.3.0
> Reporter: Yu Li
> Assignee: Yu Li
> Priority: Minor
> Labels: balancer
> Attachments: HDFS-6010-trunk.patch, HDFS-6010-trunk_V2.patch
>
>
> Currently, the balancer tool balances data among all datanodes. However, in
> some particular case, we would need to balance data only among specified
> nodes instead of the whole set.
> In this JIRA, a new "-servers" option would be introduced to implement this.
--
This message was sent by Atlassian JIRA
(v6.2#6252)