[
https://issues.apache.org/jira/browse/HDDS-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273774#comment-17273774
]
Yiqun Lin commented on HDDS-4754:
---------------------------------
Good catch, [~yjxxtd]!
I see currently DN heartbeat interval(hdds.heartbeat.interval) is 30s, so can
we also make HddsConfigKeys#HDDS_HEARTBEAT_INTERVAL_DEFAULT as the retry
interval here?
> A restarted SCM quickly go OOM due to ContainerReport Storm from DN cluster.
> ----------------------------------------------------------------------------
>
> Key: HDDS-4754
> URL: https://issues.apache.org/jira/browse/HDDS-4754
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: runzhiwang
> Priority: Major
> Attachments: 企业微信截图_1611734015772.png
>
>
> During tencent monthly upgrade, we restart all DNs first, then stop the SCM,
> wait for a while, start it. SCM go OOM in a short time.
> Current retry policy of DN is retry sending with a 1s interval. Given at some
> time-point, all the DNs lost connection with the SCM at the same time, due to
> restart of SCM, all DNs will send container report to SCM nearly at the same
> time, which is a ContainerReport Storm.
> We propose to change datanode retry policy to connect SCM.
> {code:java}
> public void addSCMServer(InetSocketAddress address) throws IOException {
> writeLock();
> try {
> if (scmMachines.containsKey(address)) {
> LOG.warn("Trying to add an existing SCM Machine to Machines group. " +
> "Ignoring the request.");
> return;
> }
> Configuration hadoopConfig =
> LegacyHadoopConfigurationSource.asHadoopConfiguration(this.conf);
> RPC.setProtocolEngine(
> hadoopConfig,
> StorageContainerDatanodeProtocolPB.class,
> ProtobufRpcEngine.class);
> long version =
> RPC.getProtocolVersion(StorageContainerDatanodeProtocolPB.class);
> RetryPolicy retryPolicy =
> RetryPolicies.retryUpToMaximumCountWithFixedSleep(
> getScmRpcRetryCount(conf),
> 1000, TimeUnit.MILLISECONDS);
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]