[jira] [Updated] (HDDS-5708) Skip sending container close command to unhealthy replica

Sammi Chen (Jira) Thu, 02 Sep 2021 00:37:05 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sammi Chen updated HDDS-5708:
-----------------------------
    Description: 
unhealthy replica cannot be closed by close container command.  There is no big 
impact except that there will huge logs as following in scm.log and has some 
impact on problem investigation efficiency.   This task aims to reduce the 
useless LOGs in scm.log.    

Of course, we need a better way to handle the unhealthy container which is 
always in CLOSING state.  We will find the solution once we know how the 
container becomes unhealthy with all 3 unhealthy replicas. 


2021-09-01 21:19:10,903 [ReplicationMonitor] INFO 
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close 
container command for container #110490 to datanode 
0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139, 
ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858, 
STANDALONE=9859], parent: rack561349, networkLocation: /rack561349, 
certSerialId: null, persistedOpState: IN_SERVICE, 
persistedOpStateExpiryEpochSec: 0}.
2021-09-01 21:24:11,199 [ReplicationMonitor] INFO 
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close 
container command for container #110490 to datanode 
0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139, 
ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858, 
STANDALONE=9859], parent: rack561349, networkLocation: /rack561349, 
certSerialId: null, persistedOpState: IN_SERVICE, 
persistedOpStateExpiryEpochSec: 0}.



  was:
unhealthy replica cannot be closed by close container command.  There is no big 
impact except that there will huge logs as following in scm.log.   This task 
aims to reduce the useless LOGs in scm.log.    

Of course, we need a better way to handle the unhealthy container which is 
always in CLOSING state.  We will find the solution once we know how the 
container becomes unhealthy with all 3 unhealthy replicas. 


2021-09-01 21:19:10,903 [ReplicationMonitor] INFO 
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close 
container command for container #110490 to datanode 
0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139, 
ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858, 
STANDALONE=9859], parent: rack561349, networkLocation: /rack561349, 
certSerialId: null, persistedOpState: IN_SERVICE, 
persistedOpStateExpiryEpochSec: 0}.
2021-09-01 21:24:11,199 [ReplicationMonitor] INFO 
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close 
container command for container #110490 to datanode 
0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139, 
ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858, 
STANDALONE=9859], parent: rack561349, networkLocation: /rack561349, 
certSerialId: null, persistedOpState: IN_SERVICE, 
persistedOpStateExpiryEpochSec: 0}.




> Skip sending container close command to unhealthy replica
> ---------------------------------------------------------
>
>                 Key: HDDS-5708
>                 URL: https://issues.apache.org/jira/browse/HDDS-5708
>             Project: Apache Ozone
>          Issue Type: New Feature
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>
> unhealthy replica cannot be closed by close container command.  There is no 
> big impact except that there will huge logs as following in scm.log and has 
> some impact on problem investigation efficiency.   This task aims to reduce 
> the useless LOGs in scm.log.    
> Of course, we need a better way to handle the unhealthy container which is 
> always in CLOSING state.  We will find the solution once we know how the 
> container becomes unhealthy with all 3 unhealthy replicas. 
> 2021-09-01 21:19:10,903 [ReplicationMonitor] INFO 
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close 
> container command for container #110490 to datanode 
> 0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139, 
> ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858, 
> STANDALONE=9859], parent: rack561349, networkLocation: /rack561349, 
> certSerialId: null, persistedOpState: IN_SERVICE, 
> persistedOpStateExpiryEpochSec: 0}.
> 2021-09-01 21:24:11,199 [ReplicationMonitor] INFO 
> org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending close 
> container command for container #110490 to datanode 
> 0a16a9a7-1af0-4fbe-9b32-9e67df46b4c7{ip: 11.26.17.139, host: 11.26.17.139, 
> ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9858, RATIS_SERVER=9858, 
> STANDALONE=9859], parent: rack561349, networkLocation: /rack561349, 
> certSerialId: null, persistedOpState: IN_SERVICE, 
> persistedOpStateExpiryEpochSec: 0}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-5708) Skip sending container close command to unhealthy replica

Reply via email to