[
https://issues.apache.org/jira/browse/HDDS-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sammi Chen updated HDDS-5708:
-----------------------------
Description:
unhealthy replica cannot be closed by close container command. There is no big
impact except that there will huge logs as following in scm.log and has some
impact on problem investigation efficiency. This task aims to reduce the
useless LOGs in scm.log.
Of course, we need a better way to handle the unhealthy container which is
always in CLOSING state. We will find the solution once we know how the
container becomes unhealthy with all 3 unhealthy replicas.
2021-09-01 21:19:10,903 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close
container command for container #110490 to datanode
0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139,
ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
STANDALONE=9859], parent: rack561349, networkLocation: /rack561349,
certSerialId: null, persistedOpState: IN_SERVICE,
persistedOpStateExpiryEpochSec: 0}.
2021-09-01 21:24:11,199 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close
container command for container #110490 to datanode
0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139,
ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
STANDALONE=9859], parent: rack561349, networkLocation: /rack561349,
certSerialId: null, persistedOpState: IN_SERVICE,
persistedOpStateExpiryEpochSec: 0}.
was:
unhealthy replica cannot be closed by close container command. There is no big
impact except that there will huge logs as following in scm.log. This task
aims to reduce the useless LOGs in scm.log.
Of course, we need a better way to handle the unhealthy container which is
always in CLOSING state. We will find the solution once we know how the
container becomes unhealthy with all 3 unhealthy replicas.
2021-09-01 21:19:10,903 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close
container command for container #110490 to datanode
0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139,
ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
STANDALONE=9859], parent: rack561349, networkLocation: /rack561349,
certSerialId: null, persistedOpState: IN_SERVICE,
persistedOpStateExpiryEpochSec: 0}.
2021-09-01 21:24:11,199 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close
container command for container #110490 to datanode
0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139,
ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
STANDALONE=9859], parent: rack561349, networkLocation: /rack561349,
certSerialId: null, persistedOpState: IN_SERVICE,
persistedOpStateExpiryEpochSec: 0}.
> Skip sending container close command to unhealthy replica
> ---------------------------------------------------------
>
> Key: HDDS-5708
> URL: https://issues.apache.org/jira/browse/HDDS-5708
> Project: Apache Ozone
> Issue Type: New Feature
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
>
> unhealthy replica cannot be closed by close container command. There is no
> big impact except that there will huge logs as following in scm.log and has
> some impact on problem investigation efficiency. This task aims to reduce
> the useless LOGs in scm.log.
> Of course, we need a better way to handle the unhealthy container which is
> always in CLOSING state. We will find the solution once we know how the
> container becomes unhealthy with all 3 unhealthy replicas.
> 2021-09-01 21:19:10,903 [ReplicationMonitor] INFO
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close
> container command for container #110490 to datanode
> 0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139,
> ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
> STANDALONE=9859], parent: rack561349, networkLocation: /rack561349,
> certSerialId: null, persistedOpState: IN_SERVICE,
> persistedOpStateExpiryEpochSec: 0}.
> 2021-09-01 21:24:11,199 [ReplicationMonitor] INFO
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close
> container command for container #110490 to datanode
> 0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139,
> ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858,
> STANDALONE=9859], parent: rack561349, networkLocation: /rack561349,
> certSerialId: null, persistedOpState: IN_SERVICE,
> persistedOpStateExpiryEpochSec: 0}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]