[ 
https://issues.apache.org/jira/browse/HDDS-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aryan Gupta resolved HDDS-9581.
-------------------------------
    Resolution: Cannot Reproduce

> [MasterNode decommissioning] Unable to decommission OM, no OM leader found.
> ---------------------------------------------------------------------------
>
>                 Key: HDDS-9581
>                 URL: https://issues.apache.org/jira/browse/HDDS-9581
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Pratyush Bhatt
>            Assignee: Aryan Gupta
>            Priority: Major
>
> *Scenario:* Decommission OM.
> *Steps:*
> 1. Add OM decommissioning property(om151 in our case)
> {code:java}
> 2023-10-27 13:13:58,529|INFO|MainThread|machine.py:190 - 
> run()||GUID=32a08a9e-0ca1-4ad8-8b56-325bd4f89b95|RUNNING: ozone admin om 
> getserviceroles -id=ozone1 | egrep 'FOLLOWER|LEADER'
> 2023-10-27 13:14:02,605|INFO|MainThread|machine.py:205 - 
> run()||GUID=32a08a9e-0ca1-4ad8-8b56-325bd4f89b95|om151 : FOLLOWER 
> (ozn-decom75-5.ozn-decom75.xyz)
> 2023-10-27 13:14:02,606|INFO|MainThread|machine.py:212 - 
> run()||GUID=32a08a9e-0ca1-4ad8-8b56-325bd4f89b95|om181 : FOLLOWER 
> (ozn-decom75-3.ozn-decom75.xyz)
> 2023-10-27 13:14:02,606|INFO|MainThread|machine.py:212 - 
> run()||GUID=32a08a9e-0ca1-4ad8-8b56-325bd4f89b95|om176 : LEADER 
> (ozn-decom75-1.ozn-decom75.xyz)
> 2023-10-27 13:14:02,607|INFO|MainThread|machine.py:232 - 
> run()||GUID=32a08a9e-0ca1-4ad8-8b56-325bd4f89b95|Exit Code: 0 {code}
> {code:java}
> 2023-10-27 13:14:06,505|INFO|MainThread|cm_apilib.py:818 - setConfig()|Update 
> Config = {'ozone.om.decommissioned.nodes.ozone1': 'om151'} for Service = ozone
> {code}
> {code:java}
> 2023-10-27 13:22:18,099|INFO|MainThread|ozone.py:4203 - 
> addOMDecommissionProperty()|Configs successfully copied
> 2023-10-27 13:22:18,099|INFO|MainThread|ozone.py:4400 - 
> omNodeDecommission()|OM Decommissioning property addition successful! {code}
> 2. Decommission the OM.
> {code:java}
> 2023-10-27 13:22:19,713|INFO|MainThread|cm_apilib.py:1126 - 
> roleCommandByName()|Command name = OzoneOMDecommissionCommand, ID = 12953
> 2023-10-27 13:22:19,714|INFO|MainThread|cm_apilib.py:1135 - 
> roleCommandByName()|Wait until request completes...
> 2023-10-27 13:22:19,714|INFO|MainThread|cm_apilib.py:1565 - 
> wait_until_request_complete()|Checking Command ID = 12953 
> {code}
> *Observed:*
> Decommission of OM with id om151 failed. 
> During the timeline of decommissioning property addition, seeing below io 
> Exception logs in Leader  _[om176 : LEADER (ozn-decom75-1.ozn-decom75.xyz)]_
> {code:java}
> 2023-10-27 13:17:11,943 INFO 
> [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection:
>  om176@group-9F198C4C3682-LeaderElection2 got exception when requesting 
> votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2023-10-27 13:17:11,943 INFO 
> [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection:
>  om176@group-9F198C4C3682-LeaderElection2 got exception when requesting 
> votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2023-10-27 13:17:11,943 INFO 
> [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection:
>  om176@group-9F198C4C3682-LeaderElection2: PRE_VOTE REJECTED received 0 
> response(s) and 2 exception(s):
> 2023-10-27 13:17:11,943 INFO 
> [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection:
>    Exception 0: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2023-10-27 13:17:11,943 INFO 
> [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection:
>    Exception 1: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2023-10-27 13:17:11,943 INFO 
> [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection:
>  om176@group-9F198C4C3682-LeaderElection2 PRE_VOTE round 0: result REJECTED
> 2023-10-27 13:17:11,943 INFO 
> [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.RaftServer$Division:
>  om176@group-9F198C4C3682: changes role from CANDIDATE to FOLLOWER at term 74 
> for REJECTED
> 2023-10-27 13:17:11,943 INFO 
> [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.RoleInfo:
>  om176: shutdown om176@group-9F198C4C3682-LeaderElection2
> 2023-10-27 13:17:11,943 INFO 
> [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.RoleInfo:
>  om176: start om176@group-9F198C4C3682-FollowerState
> 2023-10-27 13:17:17,036 INFO 
> [om176@group-9F198C4C3682-FollowerState]-org.apache.ratis.server.impl.FollowerState:
>  om176@group-9F198C4C3682-FollowerState: change to CANDIDATE, 
> lastRpcElapsedTime:5092555240ns, electionTimeout:5092ms
> 2023-10-27 13:17:17,037 INFO 
> [om176@group-9F198C4C3682-FollowerState]-org.apache.ratis.server.impl.RoleInfo:
>  om176: shutdown om176@group-9F198C4C3682-FollowerState
> 2023-10-27 13:17:17,037 INFO 
> [om176@group-9F198C4C3682-FollowerState]-org.apache.ratis.server.RaftServer$Division:
>  om176@group-9F198C4C3682: changes role from  FOLLOWER to CANDIDATE at term 
> 74 for changeToCandidate
> 2023-10-27 13:17:17,037 INFO 
> [om176@group-9F198C4C3682-FollowerState]-org.apache.ratis.server.RaftServerConfigKeys:
>  raft.server.leaderelection.pre-vote = true (default)
> 2023-10-27 13:17:17,037 INFO 
> [om176@group-9F198C4C3682-FollowerState]-org.apache.ratis.server.impl.RoleInfo:
>  om176: start om176@group-9F198C4C3682-LeaderElection3
> 2023-10-27 13:17:17,038 INFO 
> [om176@group-9F198C4C3682-LeaderElection3]-org.apache.ratis.server.impl.LeaderElection:
>  om176@group-9F198C4C3682-LeaderElection3 PRE_VOTE round 0: submit vote 
> requests at term 74 for 411235: 
> peers:[om151|rpc:ozn-decom75-5.ozn-decom75.xyz:1111|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
>  
> om181|rpc:ozn-decom75-3.ozn-decom75.xyz:1111|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER,
>  
> om176|rpc:ozn-decom75-1.ozn-decom75.xyz:1111|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[],
>  old=null
> 2023-10-27 13:17:17,038 INFO 
> [om176@group-9F198C4C3682-LeaderElection3]-org.apache.ratis.server.impl.LeaderElection:
>  om176@group-9F198C4C3682-LeaderElection3 got exception when requesting 
> votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception
> 2023-10-27 13:17:17,039 INFO 
> [om176@group-9F198C4C3682-LeaderElection3]-org.apache.ratis.server.impl.LeaderElection:
>  om176@group-9F198C4C3682-LeaderElection3 got exception when requesting 
> votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception {code}
> Also saw _Unable to load library: ozone_rocksdb_tools_ before this error:
> {code:java}
> 2023-10-27 13:16:57,063 INFO 
> [main]-org.apache.hadoop.hdds.utils.NativeLibraryLoader: Loading Library: 
> ozone_rocksdb_tools
> 2023-10-27 13:16:57,064 WARN 
> [main]-org.apache.hadoop.hdds.utils.NativeLibraryLoader: Unable to load 
> library: ozone_rocksdb_tools
> java.io.IOException: Permission denied
>         at java.io.UnixFileSystem.createFileExclusively(Native Method)
>         at java.io.File.createTempFile(File.java:2024)
>         at 
> org.apache.hadoop.hdds.utils.NativeLibraryLoader.copyResourceFromJarToTemp(NativeLibraryLoader.java:140)
>         at 
> org.apache.hadoop.hdds.utils.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:116)
>         at 
> org.apache.hadoop.hdds.utils.db.managed.ManagedSSTDumpTool.<clinit>(ManagedSSTDumpTool.java:39)
>         at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.initSSTDumpTool(SnapshotDiffManager.java:310)
>         at 
> org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.<init>(SnapshotDiffManager.java:269)
>         at 
> org.apache.hadoop.ozone.om.OmSnapshotManager.<init>(OmSnapshotManager.java:277)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.instantiateServices(OzoneManager.java:840)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:670)
>         at 
> org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:752)
>         at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189)
>         at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86)
>         at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74)
>         at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
>         at picocli.CommandLine.access$1300(CommandLine.java:145)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
>         at picocli.CommandLine.execute(CommandLine.java:2078)
>         at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
>         at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
>         at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58)
> 2023-10-27 13:16:57,066 INFO 
> [main]-org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager: Shutting down 
> executorService: 'SstDumpToolExecutor'
> 2023-10-27 13:16:57,207 WARN 
> [main]-org.apache.hadoop.ozone.om.ratis.utils.OzoneManagerRatisUtils: 
> ozone.om.ratis.snapshot.dir is not configured. Falling back to 
> ozone.metadata.dirs config {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to