[ 
https://issues.apache.org/jira/browse/HDDS-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sadanand Shenoy updated HDDS-9055:
----------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Datanode decommission Failed, Follower never received the command
> -----------------------------------------------------------------
>
>                 Key: HDDS-9055
>                 URL: https://issues.apache.org/jira/browse/HDDS-9055
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM HA
>            Reporter: Soumitra Sulav
>            Assignee: Sumit Agrawal
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>
> *Issue:*
> As per one of the Cloudera system test, 2 Datanode are scheduled for 
> decommission post data write and data pipeline close.
> LEADER node has received the scheduled command for decommission as expected 
> from the test, But the FOLLOWER never received the decommission.
> *Summary logs :*
> Follower
> {code:java}
> 19:58:04,931 : persistedOpState: DECOMMISSIONING, the value stored in SCM 
> (IN_SERVICE, 0)
> 19:58:10,016 : persistedOpState: IN_SERVICE,  the value stored in SCM 
> (DECOMMISSIONING, 0)
> {code}
> Leader: TimeOut
> {code:java}
> 2023-07-20 19:38:31,689 : persistedOpState: IN_SERVICE, the value stored in 
> SCM (DECOMMISSIONING, 0)
> ...... multiple retries .......
> 2023-07-20 19:55:54,323 : persistedOpState: IN_SERVICE, the value stored in 
> SCM (DECOMMISSIONING, 0)
> 2023-07-20 19:56:24,344 : persistedOpState: IN_SERVICE, the value stored in 
> SCM (DECOMMISSIONING, 0)
> 2023-07-20 19:58:04,931 : persistedOpState: DECOMMISSIONING, the value stored 
> in SCM (IN_SERVICE, 0)
> {code}
> *Detailed logs :*
> {code:java}
> FOLLOWER
> 2023-07-20 19:58:04,931 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> Update the operationalState saved in follower SCM for 
> 33c95701-aaa5-4b08-a56b-70ac5d237187{ip: 172.27.12.66, host: 
> quasar-zqlpfe-5.quasar-zqlpfe.root.hwx.site, ports: [REPLICATION=9886, 
> RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], 
> networkLocation: /default-rack, certSerialId: 70976812254805668, 
> persistedOpState: DECOMMISSIONING, persistedOpStateExpiryEpochSec: 0} as the 
> reported value does not match the value stored in SCM (IN_SERVICE, 0)
> 2023-07-20 19:58:10,016 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> Update the operationalState saved in follower SCM for 
> 33c95701-aaa5-4b08-a56b-70ac5d237187{ip: 172.27.12.66, host: 
> quasar-zqlpfe-5.quasar-zqlpfe.root.hwx.site, ports: [REPLICATION=9886, 
> RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], 
> networkLocation: /default-rack, certSerialId: 70976812254805668, 
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0} as the 
> reported value does not match the value stored in SCM (DECOMMISSIONING, 0)
> LEADER
> 2023-07-20 19:56:24,344 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> Scheduling a command to update the operationalState persisted on 
> 33c95701-aaa5-4b08-a56b-70ac5d237187{ip: 172.27.12.66, host: 
> quasar-zqlpfe-5.quasar-zqlpfe.root.hwx.site, ports: [REPLICATION=9886, 
> RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], 
> networkLocation: /default-rack, certSerialId: 70976812254805668, 
> persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0} as the 
> reported value does not match the value stored in SCM (DECOMMISSIONING, 0)
> 2023-07-20 19:58:04,931 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: 
> Scheduling a command to update the operationalState persisted on 
> 33c95701-aaa5-4b08-a56b-70ac5d237187{ip: 172.27.12.66, host: 
> quasar-zqlpfe-5.quasar-zqlpfe.root.hwx.site, ports: [REPLICATION=9886, 
> RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], 
> networkLocation: /default-rack, certSerialId: 70976812254805668, 
> persistedOpState: DECOMMISSIONING, persistedOpStateExpiryEpochSec: 0} as the 
> reported value does not match the value stored in SCM (IN_SERVICE, 0)
> {code}
> PFA SCM logs for more details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to