[jira] [Commented] (HDFS-15067) Optimize heartbeat for large cluster

Ayush Saxena (Jira) Fri, 10 Jan 2020 17:15:45 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013306#comment-17013306
 ]


Ayush Saxena commented on HDFS-15067:
-------------------------------------

Thanx [~surendrasingh] for the updates.
 For the configuration, I think it should be numbers only, we need not to 
change the config to use percentage, that anyway as you said will be too much 
Mathematics for the admin to land up to the percentage. In anyways the logic 
you took for the configuration to be in form of number of heartbeat is 
justified. Just in case, not specified and invalid value specified, there are 
two values coming up. I am of the opinion, rather than having two logics, have 
one. default value can be like a fallback, you don't configure or you configure 
it wrong, I go back to say x, rather than having two logics, one for invalid 
and one for not specified. The default of 3 can also land up being invalid 
mathematically and needs a fallback to the other logic. 
 I am of opinion, let the user configuration be number only, as you said, just 
remove the default there, and let the doctor himself calculate based on a 
medical college standard that if a patient can die say in 1 hour if he doesn't 
take medicine, he should ideally take at the 20th or the 30th minute. So keep 
the ball in the Doctors court, and we can trust the doctor. Rather than being 
on a state if the patient doesn't tell the time take medicine at every 9th 
minute of the hour and if he configures wrong be on the edge take it at 57th 
minute. Some issue in the end, the patient can die... :P
{quote}Admin should handle this configuration only if he know the NN and DN 
communication pattern. Configuring wrong thing in big cluster is not accepted 
and if he configured also he should correct it when he think system is behaving 
abnormally.
{quote}
Agreed. Ideally this fallback to default, wont be coming up in a deployment 
scenario. So, not worth too much thinking, I am OK not tweaking too much here, 
just my thoughts. Give a check, if worth :)
{quote}Do you think it will give some benefits ?. Standby/Observer anyway not 
doing anything,
{quote}
If it is a very big cluster, May be will shed out some lacs of RPC's as whole, 
before reaching to the threshold. Other than that I don't think much... I don't 
have much affection for the standby, so if you feel, it is not safe or not 
worth doing, I am fine ignoring this.

Overall the idea and logics Looks considerably good and safe. you may need to 
update this in the document too. There is one really good image in the comment 
above you can use that there. :)

> Optimize heartbeat for large cluster
> ------------------------------------
>
>                 Key: HDFS-15067
>                 URL: https://issues.apache.org/jira/browse/HDFS-15067
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode
>    Affects Versions: 3.1.1
>            Reporter: Surendra Singh Lilhore
>            Assignee: Surendra Singh Lilhore
>            Priority: Major
>         Attachments: HDFS-15067.01.patch, HDFS-15067.02.patch, 
> image-2020-01-09-18-00-49-556.png
>
>
> In a large cluster Namenode spend some time in processing heartbeats. For 
> example, in 10K node cluster namenode process 10K RPC's for heartbeat in each 
> 3sec. This will impact the client response time. This heart beat can be 
> optimized. DN can start skipping one heart beat if no 
> work(Write/replication/Delete) is allocated from long time. DN can start 
> sending heart beat in 6 sec. Once the DN stating getting work from NN , it 
> can start sending heart beat normally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15067) Optimize heartbeat for large cluster

Reply via email to