[
https://issues.apache.org/jira/browse/HDFS-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011764#comment-17011764
]
Surendra Singh Lilhore commented on HDFS-15067:
-----------------------------------------------
Design Idea :
=========
Added two new property
|*Property*|*Description*|*Default value*|
|dfs.datanode.heartbeat.optimizer.skip.max.heartbeat|Max number of heartbeat
can be skipped in one time.|3|
|dfs.datanode.heartbeat.optimizer.max.idle.time|Datanode idle time after which
it start skipping heartbeat. Default value is 0, means this feature is
disabled.|0|
User need to configure max heartbeat to skip and datanode max idle time, after
this time datanode start skipping heartbeat incrementlly. After elapsing first
idle window it will skip one heartbeat, after elapsing 2 idle window it will
skip two heartbeat and so on but it will give guaranty to send at least one
heartbeat before stale interval.
How many heartbeat can be skipped in one time is main logic and this is depend
on stale interval of namenode. This property is not available in datanode this
value we need get from Namenode. This we can receive in DatanodeRegistration
from namenode at the time of registration. Skipping max heartbeat is depend one
stale interval. And it will be calculated based on this formula.
*Max heartbeat to skip = min((staleInterval – heartbeatInterval)/
heartbeatInterval, configuredMaxHeartbeatSkip);*
!image-2020-01-09-18-00-49-556.png!
> Optimize heartbeat for large cluster
> ------------------------------------
>
> Key: HDFS-15067
> URL: https://issues.apache.org/jira/browse/HDFS-15067
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Affects Versions: 3.1.1
> Reporter: Surendra Singh Lilhore
> Assignee: Surendra Singh Lilhore
> Priority: Major
> Attachments: image-2020-01-09-18-00-49-556.png
>
>
> In a large cluster Namenode spend some time in processing heartbeats. For
> example, in 10K node cluster namenode process 10K RPC's for heartbeat in each
> 3sec. This will impact the client response time. This heart beat can be
> optimized. DN can start skipping one heart beat if no
> work(Write/replication/Delete) is allocated from long time. DN can start
> sending heart beat in 6 sec. Once the DN stating getting work from NN , it
> can start sending heart beat normally.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]