[
https://issues.apache.org/jira/browse/HDFS-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013306#comment-17013306
]
Ayush Saxena commented on HDFS-15067:
-------------------------------------
Thanx [~surendrasingh] for the updates.
For the configuration, I think it should be numbers only, we need not to
change the config to use percentage, that anyway as you said will be too much
Mathematics for the admin to land up to the percentage. In anyways the logic
you took for the configuration to be in form of number of heartbeat is
justified. Just in case, not specified and invalid value specified, there are
two values coming up. I am of the opinion, rather than having two logics, have
one. default value can be like a fallback, you don't configure or you configure
it wrong, I go back to say x, rather than having two logics, one for invalid
and one for not specified. The default of 3 can also land up being invalid
mathematically and needs a fallback to the other logic.
I am of opinion, let the user configuration be number only, as you said, just
remove the default there, and let the doctor himself calculate based on a
medical college standard that if a patient can die say in 1 hour if he doesn't
take medicine, he should ideally take at the 20th or the 30th minute. So keep
the ball in the Doctors court, and we can trust the doctor. Rather than being
on a state if the patient doesn't tell the time take medicine at every 9th
minute of the hour and if he configures wrong be on the edge take it at 57th
minute. Some issue in the end, the patient can die... :P
{quote}Admin should handle this configuration only if he know the NN and DN
communication pattern. Configuring wrong thing in big cluster is not accepted
and if he configured also he should correct it when he think system is behaving
abnormally.
{quote}
Agreed. Ideally this fallback to default, wont be coming up in a deployment
scenario. So, not worth too much thinking, I am OK not tweaking too much here,
just my thoughts. Give a check, if worth :)
{quote}Do you think it will give some benefits ?. Standby/Observer anyway not
doing anything,
{quote}
If it is a very big cluster, May be will shed out some lacs of RPC's as whole,
before reaching to the threshold. Other than that I don't think much... I don't
have much affection for the standby, so if you feel, it is not safe or not
worth doing, I am fine ignoring this.
Overall the idea and logics Looks considerably good and safe. you may need to
update this in the document too. There is one really good image in the comment
above you can use that there. :)
> Optimize heartbeat for large cluster
> ------------------------------------
>
> Key: HDFS-15067
> URL: https://issues.apache.org/jira/browse/HDFS-15067
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode
> Affects Versions: 3.1.1
> Reporter: Surendra Singh Lilhore
> Assignee: Surendra Singh Lilhore
> Priority: Major
> Attachments: HDFS-15067.01.patch, HDFS-15067.02.patch,
> image-2020-01-09-18-00-49-556.png
>
>
> In a large cluster Namenode spend some time in processing heartbeats. For
> example, in 10K node cluster namenode process 10K RPC's for heartbeat in each
> 3sec. This will impact the client response time. This heart beat can be
> optimized. DN can start skipping one heart beat if no
> work(Write/replication/Delete) is allocated from long time. DN can start
> sending heart beat in 6 sec. Once the DN stating getting work from NN , it
> can start sending heart beat normally.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]