[
https://issues.apache.org/jira/browse/HDFS-13783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887263#comment-16887263
]
Erik Krogen edited comment on HDFS-13783 at 7/17/19 5:19 PM:
-------------------------------------------------------------
Hey [~zhangchen], the idea is good, but I think we need some more work here
before we're ready to declare success... Top-level items to handle would be:
* A way to start/stop the service. Maybe we can re-use the existing {{hdfs
balancer}} command with a {{--service}} flag, and then we should be able to
just use {{hdfs --daemon start balancer}}.
* It seems like there should be more intelligent failure/exception handling: a
service which goes down every time it encounters an issue doesn't seem that
useful, maybe we should at least add some configurable count or time period of
retry-on-failure.
* Adding a new service seems like it deserves some documentation -- we can
probably piggyback off of the existing [balancer
documentation|https://hadoop.apache.org/docs/r3.1.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer]
in the user guide.
A few more specific comments:
* {{BalancerService}} has a constructor, but it doesn't seem that it is used
anywhere.
* On L27, you shouldn't be creating a {{new HdfsConfiguration()}}, since this
won't respect config overrides on the CLI. You should use {{ToolRunner}} here
to initialize the config.
* Why do we need {{SecurityUtil.login()}}? Doesn't the balancer support
automatically doing login via keytab already (HDFS-9804)?
* We shouldn't be ignoring {{InterruptedException}}, this is how to indicate to
a service that it should exit. They should be fatal.
* Instead of using {{conf.getLong()}}, we should use
{{conf.getTimeDuration()}}, allowing administrators to use human-readable time
values.
By the way, your patch should be named like {{HDFS-13783.002.patch}} (period
before version number, not hyphen). See the [patch naming
guide|https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute#HowToContribute-Namingyourpatch].
was (Author: xkrogen):
Hey [~zhangchen], the idea is good, but I think we need some more work here
before we're ready to declare success... Top-level items to handle would be:
* A way to start/stop the service. Maybe we can re-use the existing {{hdfs
balancer}} command with a {{--service}} flag, and then we should be able to
just use {{hdfs --daemon start balancer}}.
* It seems like there should be more intelligent failure/exception handling: a
service which goes down every time it encounters an issue doesn't seem that
useful, maybe we should at least add some configurable count or time period of
retry-on-failure.
* Adding a new service seems like it deserves some documentation -- we can
probably piggyback off of the existing [balancer
documentation|https://hadoop.apache.org/docs/r3.1.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer]
in the user guide.
A few more specific comments:
* {{BalancerService}} has a constructor, but it doesn't seem that it is used
anywhere.
* On L27, you shouldn't be creating a {{new HdfsConfiguration()}}, since this
won't respect config overrides on the CLI. You should use {{ToolRunner}} here
to initialize the config.
* Why do we need {{SecurityUtil.login()}}? Doesn't the balancer support
automatically doing login via keytab already (HDFS-9804)?
* We shouldn't be ignoring {{InterruptedException}}, this is how to indicate to
a service that it should exit. They should be fatal.
* Instead of using {{conf.getLong()}}, we should use
{{conf.getTimeDuration()}}, allowing administrators to use human-readable time
values.
By the way, your patch should be named like "HDFS-13783.002.patch" (period
before version number, not hyphen). See the [patch naming
guide|https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute#HowToContribute-Namingyourpatch].
> Balancer: make balancer to be a long service process for easy to monitor it.
> ----------------------------------------------------------------------------
>
> Key: HDFS-13783
> URL: https://issues.apache.org/jira/browse/HDFS-13783
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: balancer & mover
> Reporter: maobaolong
> Assignee: Chen Zhang
> Priority: Major
> Attachments: HDFS-13783-001.patch, HDFS-13783-002.patch
>
>
> If we have a long service process of balancer, like namenode, datanode, we
> can get metrics of balancer, the metrics can tell us the status of balancer,
> the amount of block it has moved,
> We can get or set the balance plan by the balancer webUI. So many things we
> can do if we have a long balancer service process.
> So, shall we start to plan the new Balancer? Hope this feature can enter the
> next release of hadoop.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]