[ 
https://issues.apache.org/jira/browse/HDDS-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandakumar resolved HDDS-9517.
------------------------------
    Resolution: Not A Bug

> [MasterNode decommissioning] Dead Datanode not listed after recommissioning 
> SCM
> -------------------------------------------------------------------------------
>
>                 Key: HDDS-9517
>                 URL: https://issues.apache.org/jira/browse/HDDS-9517
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Pratyush Bhatt
>            Assignee: Nandakumar
>            Priority: Major
>
> *Scenario:* Check if the newly commissioned SCM can detect a dead datanode.
> *Steps:*
> 1. Set dead node and stale node intervals
> {code:java}
> [root@ozn-decom69-7 ~]#
> [root@ozn-decom69-7 ~]# ozone getconf -confKey ozone.scm.stale.node.interval
> 2m
> [root@ozn-decom69-7 ~]# ozone getconf -confKey ozone.scm.dead.node.interval
> 4m {code}
> 2. Stop a Ozone Datanode.
> 3. Decommission a SCM node.
> 4. Node becomes dead at this point.
> 5. Recommission the same SCM Node.
> 6. Transfer the leadership to the new SCM Node.
> 7. Check the datanode list. 
> *Timeline:* 
> {code:java}
> ROLES IN THE STARTING
> [root@ozn-decom69-7 ~]# ozone admin scm roles
> ozn-decom69-2.ozn-decom69.xyz:1234:FOLLOWER:cc7a176f-261c-4311-a54d-9c1900c9865b:172.27.33.144
> ozn-decom69-4.ozn-decom69.xyz:1234:LEADER:07cb1c07-6e05-4577-8f68-ad4769aae2ee:172.27.16.209
> ozn-decom69-7.ozn-decom69.xyz:1234:FOLLOWER:e0b1fef7-3cf3-4d24-ba7a-8ad74cb9bc54:172.27.92.5
> DATANODE INFO (8 DNs in total)
> [root@ozn-decom69-7 ~]# ozone admin datanode list | egrep 
> 'Datanode:|Operational State:|Health State:'
> Datanode: 824b5a4a-455a-4910-94e2-8fa723738d44 
> (/default/172.27.204.79/ozn-decom69-9.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 8f4e9d3a-6fe9-46f3-80be-605933eadfac 
> (/default/172.27.140.131/ozn-decom69-5.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: b314fc94-4661-4c1a-a2ff-ad3c667c1ba0 
> (/default/172.27.23.128/ozn-decom69-8.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 5646a71c-8210-46cc-8c5f-c3f1e1889791 
> (/default/172.27.16.209/ozn-decom69-4.ozn-decom69.xyz/1 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 589ae3cd-f57d-4b78-87cc-1892c095a877 
> (/default/172.27.110.132/ozn-decom69-1.ozn-decom69.xyz/1 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 677299b4-a572-4028-8b57-3958bbe3049f 
> (/default/172.27.92.5/ozn-decom69-7.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 771f7644-8551-4c5f-851f-374115031aa4 
> (/default/172.27.33.144/ozn-decom69-2.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 77536a60-0e34-4e29-8753-4cae449a3b8e 
> (/default/172.27.26.19/ozn-decom69-6.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> STOP ONE OF THE DATANODE (ozn-decom69-6.ozn-decom69.xyz) in our case
> DECOMMISSION ONE OF THE SCM NODE
> [root@ozn-decom69-7 ~]# ozone admin scm decommission 
> -nodeid=e0b1fef7-3cf3-4d24-ba7a-8ad74cb9bc54
> Decommissioned Scm e0b1fef7-3cf3-4d24-ba7a-8ad74cb9bc54
> DATANODE IS MADE DEAD AT THIS POINT.
> [root@ozn-decom69-7 ~]# ozone admin datanode list | egrep 
> 'Datanode:|Operational State:|Health State:'
> Datanode: 824b5a4a-455a-4910-94e2-8fa723738d44 
> (/default/172.27.204.79/ozn-decom69-9.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 8f4e9d3a-6fe9-46f3-80be-605933eadfac 
> (/default/172.27.140.131/ozn-decom69-5.ozn-decom69.xyz/2 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: b314fc94-4661-4c1a-a2ff-ad3c667c1ba0 
> (/default/172.27.23.128/ozn-decom69-8.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 5646a71c-8210-46cc-8c5f-c3f1e1889791 
> (/default/172.27.16.209/ozn-decom69-4.ozn-decom69.xyz/2 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 589ae3cd-f57d-4b78-87cc-1892c095a877 
> (/default/172.27.110.132/ozn-decom69-1.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 677299b4-a572-4028-8b57-3958bbe3049f 
> (/default/172.27.92.5/ozn-decom69-7.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 771f7644-8551-4c5f-851f-374115031aa4 
> (/default/172.27.33.144/ozn-decom69-2.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 77536a60-0e34-4e29-8753-4cae449a3b8e 
> (/default/172.27.26.19/ozn-decom69-6.ozn-decom69.xyz/0 pipelines)
> Operational State: IN_SERVICE
> Health State: DEAD
> CURRENT SCM ROLES
> [root@ozn-decom69-7 ~]# ozone admin scm roles
> ozn-decom69-2.ozn-decom69.xyz:1234:FOLLOWER:cc7a176f-261c-4311-a54d-9c1900c9865b:172.27.33.144
> ozn-decom69-4.ozn-decom69.xyz:1234:LEADER:07cb1c07-6e05-4577-8f68-ad4769aae2ee:172.27.16.209
> RECOMMISSIONED THE SAME SCM BACK
> [root@ozn-decom69-7 ~]# ozone admin scm roles
> ozn-decom69-2.ozn-decom69.xyz:9894:FOLLOWER:cc7a176f-261c-4311-a54d-9c1900c9865b:172.27.33.144
> ozn-decom69-4.ozn-decom69.xyz:9894:LEADER:07cb1c07-6e05-4577-8f68-ad4769aae2ee:172.27.16.209
> ozn-decom69-7.ozn-decom69.xyz:9894:FOLLOWER:ae9f74f0-7452-448e-9731-4d88f6221b6b:172.27.92.5
> CHECKING THE DN STATUS
> [root@ozn-decom69-7 ~]# ozone admin datanode list | egrep 
> 'Datanode:|Operational State:|Health State:'
> Datanode: 824b5a4a-455a-4910-94e2-8fa723738d44 
> (/default/172.27.204.79/ozn-decom69-9.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 8f4e9d3a-6fe9-46f3-80be-605933eadfac 
> (/default/172.27.140.131/ozn-decom69-5.ozn-decom69.xyz/2 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: b314fc94-4661-4c1a-a2ff-ad3c667c1ba0 
> (/default/172.27.23.128/ozn-decom69-8.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 5646a71c-8210-46cc-8c5f-c3f1e1889791 
> (/default/172.27.16.209/ozn-decom69-4.ozn-decom69.xyz/2 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 589ae3cd-f57d-4b78-87cc-1892c095a877 
> (/default/172.27.110.132/ozn-decom69-1.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 677299b4-a572-4028-8b57-3958bbe3049f 
> (/default/172.27.92.5/ozn-decom69-7.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 771f7644-8551-4c5f-851f-374115031aa4 
> (/default/172.27.33.144/ozn-decom69-2.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 77536a60-0e34-4e29-8753-4cae449a3b8e 
> (/default/172.27.26.19/ozn-decom69-6.ozn-decom69.xyz/0 pipelines)
> Operational State: IN_SERVICE
> Health State: DEAD
> TRANSFER THE LEADERSHIP TO THE NEWLY COMMISSIONED SCM ROLE
> [root@ozn-decom69-7 ~]# ozone admin scm transfer -n 
> ae9f74f0-7452-448e-9731-4d88f6221b6b
> Transfer leadership successfully to ae9f74f0-7452-448e-9731-4d88f6221b6b.
> CHECK DN STATUS (only shows 7 nodes, ozn-decom69-6.ozn-decom69.xyz is gone)
> [root@ozn-decom69-7 ~]# ozone admin datanode list | egrep 
> 'Datanode:|Operational State:|Health State:'
> Datanode: 824b5a4a-455a-4910-94e2-8fa723738d44 
> (/default/172.27.204.79/ozn-decom69-9.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 8f4e9d3a-6fe9-46f3-80be-605933eadfac 
> (/default/172.27.140.131/ozn-decom69-5.ozn-decom69.xyz/2 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: b314fc94-4661-4c1a-a2ff-ad3c667c1ba0 
> (/default/172.27.23.128/ozn-decom69-8.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 5646a71c-8210-46cc-8c5f-c3f1e1889791 
> (/default/172.27.16.209/ozn-decom69-4.ozn-decom69.xyz/2 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 589ae3cd-f57d-4b78-87cc-1892c095a877 
> (/default/172.27.110.132/ozn-decom69-1.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 677299b4-a572-4028-8b57-3958bbe3049f 
> (/default/172.27.92.5/ozn-decom69-7.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY
> Datanode: 771f7644-8551-4c5f-851f-374115031aa4 
> (/default/172.27.33.144/ozn-decom69-2.ozn-decom69.xyz/3 pipelines)
> Operational State: IN_SERVICE
> Health State: HEALTHY{code}
> *Observed behavior:*
> After all the steps, the new recommissioned SCM leader is unable to detect 
> the dead node, not shown in DN list as well.
> *Expected behavior:*
> DN state should have been preserved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to