[
https://issues.apache.org/jira/browse/HDFS-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013532#comment-17013532
]
Ayush Saxena commented on HDFS-15067:
-------------------------------------
Started reading the code, One doubt :
Seems you reset the heartbeats in case the bps turns to active, but there is a
condition there after :
{code:java}
} else if (!nnClaimsActive && bposThinksActive) {
{code}
This condition checks in layman terms that if the known active turned to
standby, in this case Ideally we should reset the heartbeats for all the bps,
so that the new active can be identified, otherwise the bps tracking the
standby will be at max dn interval, so it will be delayed in identifying the
new active.
Other concern is, if there is no active in one stage all are in standby for
some time, the dn Interval will shoot to max, since standby won't be giving any
instructions to the dn, so post the dn's reaches the max threshold then if the
active comes up, then the identification of active namenode shall be delayed. I
think we should check if {{bpServiceToActive}} is null, then we should get
into delaying heartbeat.
> Optimize heartbeat for large cluster
> ------------------------------------
>
> Key: HDFS-15067
> URL: https://issues.apache.org/jira/browse/HDFS-15067
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode
> Affects Versions: 3.1.1
> Reporter: Surendra Singh Lilhore
> Assignee: Surendra Singh Lilhore
> Priority: Major
> Attachments: HDFS-15067.01.patch, HDFS-15067.02.patch,
> image-2020-01-09-18-00-49-556.png
>
>
> In a large cluster Namenode spend some time in processing heartbeats. For
> example, in 10K node cluster namenode process 10K RPC's for heartbeat in each
> 3sec. This will impact the client response time. This heart beat can be
> optimized. DN can start skipping one heart beat if no
> work(Write/replication/Delete) is allocated from long time. DN can start
> sending heart beat in 6 sec. Once the DN stating getting work from NN , it
> can start sending heart beat normally.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]