[
https://issues.apache.org/jira/browse/HDDS-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Glen Geng updated HDDS-4754:
----------------------------
Description:
During our upgrade, we restart all DNs first, then stop the SCM, wait for a
while, start it.
Current retry policy is
Given at some time-point, all the DNs lost connection with the SCM at the same
time, they will
We propose to change datanode retry policy to connect SCM.
{code:java}
public void addSCMServer(InetSocketAddress address) throws IOException {
writeLock();
try {
if (scmMachines.containsKey(address)) {
LOG.warn("Trying to add an existing SCM Machine to Machines group. " +
"Ignoring the request.");
return;
}
Configuration hadoopConfig =
LegacyHadoopConfigurationSource.asHadoopConfiguration(this.conf);
RPC.setProtocolEngine(
hadoopConfig,
StorageContainerDatanodeProtocolPB.class,
ProtobufRpcEngine.class);
long version =
RPC.getProtocolVersion(StorageContainerDatanodeProtocolPB.class);
RetryPolicy retryPolicy =
RetryPolicies.retryUpToMaximumCountWithFixedSleep(
getScmRpcRetryCount(conf),
1000, TimeUnit.MILLISECONDS);
{code}
was:
We propose to change datanode retry policy to connect SCM.
> A restarted SCM quickly OOM due to ContainerReport Storm from DN cluster.
> -------------------------------------------------------------------------
>
> Key: HDDS-4754
> URL: https://issues.apache.org/jira/browse/HDDS-4754
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: runzhiwang
> Priority: Major
> Attachments: 企业微信截图_1611734015772.png
>
>
>
> During our upgrade, we restart all DNs first, then stop the SCM, wait for a
> while, start it.
> Current retry policy is
> Given at some time-point, all the DNs lost connection with the SCM at the
> same time, they will
>
> We propose to change datanode retry policy to connect SCM.
> {code:java}
> public void addSCMServer(InetSocketAddress address) throws IOException {
> writeLock();
> try {
> if (scmMachines.containsKey(address)) {
> LOG.warn("Trying to add an existing SCM Machine to Machines group. " +
> "Ignoring the request.");
> return;
> }
> Configuration hadoopConfig =
> LegacyHadoopConfigurationSource.asHadoopConfiguration(this.conf);
> RPC.setProtocolEngine(
> hadoopConfig,
> StorageContainerDatanodeProtocolPB.class,
> ProtobufRpcEngine.class);
> long version =
> RPC.getProtocolVersion(StorageContainerDatanodeProtocolPB.class);
> RetryPolicy retryPolicy =
> RetryPolicies.retryUpToMaximumCountWithFixedSleep(
> getScmRpcRetryCount(conf),
> 1000, TimeUnit.MILLISECONDS);
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]