[ https://issues.apache.org/jira/browse/HDDS-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aryan Gupta resolved HDDS-9581. ------------------------------- Resolution: Cannot Reproduce > [MasterNode decommissioning] Unable to decommission OM, no OM leader found. > --------------------------------------------------------------------------- > > Key: HDDS-9581 > URL: https://issues.apache.org/jira/browse/HDDS-9581 > Project: Apache Ozone > Issue Type: Bug > Reporter: Pratyush Bhatt > Assignee: Aryan Gupta > Priority: Major > > *Scenario:* Decommission OM. > *Steps:* > 1. Add OM decommissioning property(om151 in our case) > {code:java} > 2023-10-27 13:13:58,529|INFO|MainThread|machine.py:190 - > run()||GUID=32a08a9e-0ca1-4ad8-8b56-325bd4f89b95|RUNNING: ozone admin om > getserviceroles -id=ozone1 | egrep 'FOLLOWER|LEADER' > 2023-10-27 13:14:02,605|INFO|MainThread|machine.py:205 - > run()||GUID=32a08a9e-0ca1-4ad8-8b56-325bd4f89b95|om151 : FOLLOWER > (ozn-decom75-5.ozn-decom75.xyz) > 2023-10-27 13:14:02,606|INFO|MainThread|machine.py:212 - > run()||GUID=32a08a9e-0ca1-4ad8-8b56-325bd4f89b95|om181 : FOLLOWER > (ozn-decom75-3.ozn-decom75.xyz) > 2023-10-27 13:14:02,606|INFO|MainThread|machine.py:212 - > run()||GUID=32a08a9e-0ca1-4ad8-8b56-325bd4f89b95|om176 : LEADER > (ozn-decom75-1.ozn-decom75.xyz) > 2023-10-27 13:14:02,607|INFO|MainThread|machine.py:232 - > run()||GUID=32a08a9e-0ca1-4ad8-8b56-325bd4f89b95|Exit Code: 0 {code} > {code:java} > 2023-10-27 13:14:06,505|INFO|MainThread|cm_apilib.py:818 - setConfig()|Update > Config = {'ozone.om.decommissioned.nodes.ozone1': 'om151'} for Service = ozone > {code} > {code:java} > 2023-10-27 13:22:18,099|INFO|MainThread|ozone.py:4203 - > addOMDecommissionProperty()|Configs successfully copied > 2023-10-27 13:22:18,099|INFO|MainThread|ozone.py:4400 - > omNodeDecommission()|OM Decommissioning property addition successful! {code} > 2. Decommission the OM. > {code:java} > 2023-10-27 13:22:19,713|INFO|MainThread|cm_apilib.py:1126 - > roleCommandByName()|Command name = OzoneOMDecommissionCommand, ID = 12953 > 2023-10-27 13:22:19,714|INFO|MainThread|cm_apilib.py:1135 - > roleCommandByName()|Wait until request completes... > 2023-10-27 13:22:19,714|INFO|MainThread|cm_apilib.py:1565 - > wait_until_request_complete()|Checking Command ID = 12953 > {code} > *Observed:* > Decommission of OM with id om151 failed. > During the timeline of decommissioning property addition, seeing below io > Exception logs in Leader _[om176 : LEADER (ozn-decom75-1.ozn-decom75.xyz)]_ > {code:java} > 2023-10-27 13:17:11,943 INFO > [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection: > om176@group-9F198C4C3682-LeaderElection2 got exception when requesting > votes: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > 2023-10-27 13:17:11,943 INFO > [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection: > om176@group-9F198C4C3682-LeaderElection2 got exception when requesting > votes: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > 2023-10-27 13:17:11,943 INFO > [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection: > om176@group-9F198C4C3682-LeaderElection2: PRE_VOTE REJECTED received 0 > response(s) and 2 exception(s): > 2023-10-27 13:17:11,943 INFO > [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection: > Exception 0: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > 2023-10-27 13:17:11,943 INFO > [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection: > Exception 1: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > 2023-10-27 13:17:11,943 INFO > [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.LeaderElection: > om176@group-9F198C4C3682-LeaderElection2 PRE_VOTE round 0: result REJECTED > 2023-10-27 13:17:11,943 INFO > [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.RaftServer$Division: > om176@group-9F198C4C3682: changes role from CANDIDATE to FOLLOWER at term 74 > for REJECTED > 2023-10-27 13:17:11,943 INFO > [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.RoleInfo: > om176: shutdown om176@group-9F198C4C3682-LeaderElection2 > 2023-10-27 13:17:11,943 INFO > [om176@group-9F198C4C3682-LeaderElection2]-org.apache.ratis.server.impl.RoleInfo: > om176: start om176@group-9F198C4C3682-FollowerState > 2023-10-27 13:17:17,036 INFO > [om176@group-9F198C4C3682-FollowerState]-org.apache.ratis.server.impl.FollowerState: > om176@group-9F198C4C3682-FollowerState: change to CANDIDATE, > lastRpcElapsedTime:5092555240ns, electionTimeout:5092ms > 2023-10-27 13:17:17,037 INFO > [om176@group-9F198C4C3682-FollowerState]-org.apache.ratis.server.impl.RoleInfo: > om176: shutdown om176@group-9F198C4C3682-FollowerState > 2023-10-27 13:17:17,037 INFO > [om176@group-9F198C4C3682-FollowerState]-org.apache.ratis.server.RaftServer$Division: > om176@group-9F198C4C3682: changes role from FOLLOWER to CANDIDATE at term > 74 for changeToCandidate > 2023-10-27 13:17:17,037 INFO > [om176@group-9F198C4C3682-FollowerState]-org.apache.ratis.server.RaftServerConfigKeys: > raft.server.leaderelection.pre-vote = true (default) > 2023-10-27 13:17:17,037 INFO > [om176@group-9F198C4C3682-FollowerState]-org.apache.ratis.server.impl.RoleInfo: > om176: start om176@group-9F198C4C3682-LeaderElection3 > 2023-10-27 13:17:17,038 INFO > [om176@group-9F198C4C3682-LeaderElection3]-org.apache.ratis.server.impl.LeaderElection: > om176@group-9F198C4C3682-LeaderElection3 PRE_VOTE round 0: submit vote > requests at term 74 for 411235: > peers:[om151|rpc:ozn-decom75-5.ozn-decom75.xyz:1111|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER, > > om181|rpc:ozn-decom75-3.ozn-decom75.xyz:1111|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER, > > om176|rpc:ozn-decom75-1.ozn-decom75.xyz:1111|admin:|client:|dataStream:|priority:0|startupRole:FOLLOWER]|listeners:[], > old=null > 2023-10-27 13:17:17,038 INFO > [om176@group-9F198C4C3682-LeaderElection3]-org.apache.ratis.server.impl.LeaderElection: > om176@group-9F198C4C3682-LeaderElection3 got exception when requesting > votes: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception > 2023-10-27 13:17:17,039 INFO > [om176@group-9F198C4C3682-LeaderElection3]-org.apache.ratis.server.impl.LeaderElection: > om176@group-9F198C4C3682-LeaderElection3 got exception when requesting > votes: java.util.concurrent.ExecutionException: > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io > exception {code} > Also saw _Unable to load library: ozone_rocksdb_tools_ before this error: > {code:java} > 2023-10-27 13:16:57,063 INFO > [main]-org.apache.hadoop.hdds.utils.NativeLibraryLoader: Loading Library: > ozone_rocksdb_tools > 2023-10-27 13:16:57,064 WARN > [main]-org.apache.hadoop.hdds.utils.NativeLibraryLoader: Unable to load > library: ozone_rocksdb_tools > java.io.IOException: Permission denied > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createTempFile(File.java:2024) > at > org.apache.hadoop.hdds.utils.NativeLibraryLoader.copyResourceFromJarToTemp(NativeLibraryLoader.java:140) > at > org.apache.hadoop.hdds.utils.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:116) > at > org.apache.hadoop.hdds.utils.db.managed.ManagedSSTDumpTool.<clinit>(ManagedSSTDumpTool.java:39) > at > org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.initSSTDumpTool(SnapshotDiffManager.java:310) > at > org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.<init>(SnapshotDiffManager.java:269) > at > org.apache.hadoop.ozone.om.OmSnapshotManager.<init>(OmSnapshotManager.java:277) > at > org.apache.hadoop.ozone.om.OzoneManager.instantiateServices(OzoneManager.java:840) > at > org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:670) > at > org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:752) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74) > at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38) > at picocli.CommandLine.executeUserObject(CommandLine.java:1953) > at picocli.CommandLine.access$1300(CommandLine.java:145) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2346) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2311) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) > at picocli.CommandLine.execute(CommandLine.java:2078) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91) > at > org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58) > 2023-10-27 13:16:57,066 INFO > [main]-org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager: Shutting down > executorService: 'SstDumpToolExecutor' > 2023-10-27 13:16:57,207 WARN > [main]-org.apache.hadoop.ozone.om.ratis.utils.OzoneManagerRatisUtils: > ozone.om.ratis.snapshot.dir is not configured. Falling back to > ozone.metadata.dirs config {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org