Pratyush Bhatt created HDDS-9517:
------------------------------------

             Summary: [MasterNode decommissioning] Dead Datanode not listed 
after recommissioning SCM
                 Key: HDDS-9517
                 URL: https://issues.apache.org/jira/browse/HDDS-9517
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Pratyush Bhatt


*Scenario:* Check if the newly commissioned SCM can detect a dead datanode.

*Steps:*
1. Set dead node and stale node intervals
{code:java}
[root@ozn-decom69-7 ~]#
[root@ozn-decom69-7 ~]# ozone getconf -confKey ozone.scm.stale.node.interval
2m
[root@ozn-decom69-7 ~]# ozone getconf -confKey ozone.scm.dead.node.interval
4m {code}
2. Stop a Ozone Datanode.
3. Decommission a SCM node.
4. Node becomes dead at this point.
5. Recommission the same SCM Node.
6. Transfer the leadership to the new SCM Node.
7. Check the datanode list. 



*Timeline:* 
{code:java}
ROLES IN THE STARTING

[root@ozn-decom69-7 ~]# ozone admin scm roles
ozn-decom69-2.ozn-decom69.xyz:1234:FOLLOWER:cc7a176f-261c-4311-a54d-9c1900c9865b:172.27.33.144
ozn-decom69-4.ozn-decom69.xyz:1234:LEADER:07cb1c07-6e05-4577-8f68-ad4769aae2ee:172.27.16.209
ozn-decom69-7.ozn-decom69.xyz:1234:FOLLOWER:e0b1fef7-3cf3-4d24-ba7a-8ad74cb9bc54:172.27.92.5

DATANODE INFO (8 DNs in total)

[root@ozn-decom69-7 ~]# ozone admin datanode list | egrep 
'Datanode:|Operational State:|Health State:'
Datanode: 824b5a4a-455a-4910-94e2-8fa723738d44 
(/default/172.27.204.79/ozn-decom69-9.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 8f4e9d3a-6fe9-46f3-80be-605933eadfac 
(/default/172.27.140.131/ozn-decom69-5.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: b314fc94-4661-4c1a-a2ff-ad3c667c1ba0 
(/default/172.27.23.128/ozn-decom69-8.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 5646a71c-8210-46cc-8c5f-c3f1e1889791 
(/default/172.27.16.209/ozn-decom69-4.ozn-decom69.xyz/1 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 589ae3cd-f57d-4b78-87cc-1892c095a877 
(/default/172.27.110.132/ozn-decom69-1.ozn-decom69.xyz/1 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 677299b4-a572-4028-8b57-3958bbe3049f 
(/default/172.27.92.5/ozn-decom69-7.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 771f7644-8551-4c5f-851f-374115031aa4 
(/default/172.27.33.144/ozn-decom69-2.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 77536a60-0e34-4e29-8753-4cae449a3b8e 
(/default/172.27.26.19/ozn-decom69-6.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY

STOP ONE OF THE DATANODE (ozn-decom69-6.ozn-decom69.xyz) in our case

DECOMMISSION ONE OF THE SCM NODE

[root@ozn-decom69-7 ~]# ozone admin scm decommission 
-nodeid=e0b1fef7-3cf3-4d24-ba7a-8ad74cb9bc54
Decommissioned Scm e0b1fef7-3cf3-4d24-ba7a-8ad74cb9bc54

DATANODE IS MADE DEAD AT THIS POINT.

[root@ozn-decom69-7 ~]# ozone admin datanode list | egrep 
'Datanode:|Operational State:|Health State:'
Datanode: 824b5a4a-455a-4910-94e2-8fa723738d44 
(/default/172.27.204.79/ozn-decom69-9.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 8f4e9d3a-6fe9-46f3-80be-605933eadfac 
(/default/172.27.140.131/ozn-decom69-5.ozn-decom69.xyz/2 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: b314fc94-4661-4c1a-a2ff-ad3c667c1ba0 
(/default/172.27.23.128/ozn-decom69-8.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 5646a71c-8210-46cc-8c5f-c3f1e1889791 
(/default/172.27.16.209/ozn-decom69-4.ozn-decom69.xyz/2 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 589ae3cd-f57d-4b78-87cc-1892c095a877 
(/default/172.27.110.132/ozn-decom69-1.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 677299b4-a572-4028-8b57-3958bbe3049f 
(/default/172.27.92.5/ozn-decom69-7.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 771f7644-8551-4c5f-851f-374115031aa4 
(/default/172.27.33.144/ozn-decom69-2.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 77536a60-0e34-4e29-8753-4cae449a3b8e 
(/default/172.27.26.19/ozn-decom69-6.ozn-decom69.xyz/0 pipelines)
Operational State: IN_SERVICE
Health State: DEAD

CURRENT SCM ROLES

[root@ozn-decom69-7 ~]# ozone admin scm roles
ozn-decom69-2.ozn-decom69.xyz:1234:FOLLOWER:cc7a176f-261c-4311-a54d-9c1900c9865b:172.27.33.144
ozn-decom69-4.ozn-decom69.xyz:1234:LEADER:07cb1c07-6e05-4577-8f68-ad4769aae2ee:172.27.16.209

RECOMMISSIONED THE SAME SCM BACK

[root@ozn-decom69-7 ~]# ozone admin scm roles
ozn-decom69-2.ozn-decom69.xyz:9894:FOLLOWER:cc7a176f-261c-4311-a54d-9c1900c9865b:172.27.33.144
ozn-decom69-4.ozn-decom69.xyz:9894:LEADER:07cb1c07-6e05-4577-8f68-ad4769aae2ee:172.27.16.209
ozn-decom69-7.ozn-decom69.xyz:9894:FOLLOWER:ae9f74f0-7452-448e-9731-4d88f6221b6b:172.27.92.5

CHECKING THE DN STATUS

[root@ozn-decom69-7 ~]# ozone admin datanode list | egrep 
'Datanode:|Operational State:|Health State:'
Datanode: 824b5a4a-455a-4910-94e2-8fa723738d44 
(/default/172.27.204.79/ozn-decom69-9.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 8f4e9d3a-6fe9-46f3-80be-605933eadfac 
(/default/172.27.140.131/ozn-decom69-5.ozn-decom69.xyz/2 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: b314fc94-4661-4c1a-a2ff-ad3c667c1ba0 
(/default/172.27.23.128/ozn-decom69-8.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 5646a71c-8210-46cc-8c5f-c3f1e1889791 
(/default/172.27.16.209/ozn-decom69-4.ozn-decom69.xyz/2 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 589ae3cd-f57d-4b78-87cc-1892c095a877 
(/default/172.27.110.132/ozn-decom69-1.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 677299b4-a572-4028-8b57-3958bbe3049f 
(/default/172.27.92.5/ozn-decom69-7.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 771f7644-8551-4c5f-851f-374115031aa4 
(/default/172.27.33.144/ozn-decom69-2.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 77536a60-0e34-4e29-8753-4cae449a3b8e 
(/default/172.27.26.19/ozn-decom69-6.ozn-decom69.xyz/0 pipelines)
Operational State: IN_SERVICE
Health State: DEAD

TRANSFER THE LEADERSHIP TO THE NEWLY COMMISSIONED SCM ROLE

[root@ozn-decom69-7 ~]# ozone admin scm transfer -n 
ae9f74f0-7452-448e-9731-4d88f6221b6b
Transfer leadership successfully to ae9f74f0-7452-448e-9731-4d88f6221b6b.

CHECK DN STATUS (only shows 7 nodes, ozn-decom69-6.ozn-decom69.xyz is gone)

[root@ozn-decom69-7 ~]# ozone admin datanode list | egrep 
'Datanode:|Operational State:|Health State:'
Datanode: 824b5a4a-455a-4910-94e2-8fa723738d44 
(/default/172.27.204.79/ozn-decom69-9.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 8f4e9d3a-6fe9-46f3-80be-605933eadfac 
(/default/172.27.140.131/ozn-decom69-5.ozn-decom69.xyz/2 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: b314fc94-4661-4c1a-a2ff-ad3c667c1ba0 
(/default/172.27.23.128/ozn-decom69-8.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 5646a71c-8210-46cc-8c5f-c3f1e1889791 
(/default/172.27.16.209/ozn-decom69-4.ozn-decom69.xyz/2 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 589ae3cd-f57d-4b78-87cc-1892c095a877 
(/default/172.27.110.132/ozn-decom69-1.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 677299b4-a572-4028-8b57-3958bbe3049f 
(/default/172.27.92.5/ozn-decom69-7.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY
Datanode: 771f7644-8551-4c5f-851f-374115031aa4 
(/default/172.27.33.144/ozn-decom69-2.ozn-decom69.xyz/3 pipelines)
Operational State: IN_SERVICE
Health State: HEALTHY{code}
*Observed behavior:*

After all the steps, the new recommissioned SCM leader is unable to detect the 
dead node, not shown in DN list as well.

*Expected behavior:*

DN state should have been preserved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to