[
https://issues.apache.org/jira/browse/HDDS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prashant Pogde updated HDDS-3379:
---------------------------------
Target Version/s: 1.2.0
I am managing the 1.1.0 release and we currently have more than 600 issues
targeted for 1.1.0. I am moving the target field to 1.2.0.
If you are actively working on this jira and believe this should be targeted to
1.1.0 release, Please change the target field back to 1.1.0 before Feb 05,
2021.
> Clients unable to failover after the OzoneManager leader is restart in
> MiniOzoneChaosCluster
> --------------------------------------------------------------------------------------------
>
> Key: HDDS-3379
> URL: https://issues.apache.org/jira/browse/HDDS-3379
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Manager
> Reporter: Mukul Kumar Singh
> Priority: Major
> Labels: MiniOzoneChaosCluster, TriagePending
>
> Clients unable to failover after the OzoneManager leader is restart in
> MiniOzoneChaosCluster.
> This happens after the following restart events.
> {code}
> ➜ chaos-2020-04-11-21-51-52-IST egrep "iniOzoneHAClusterImp|Failures"
> complete.log
> 2020-04-11 21:52:08,296
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO
> ozone.MiniOzoneHAClusterImpl
> (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC
> server at localhost/127.0.0.1:10804
> 2020-04-11 21:52:08,387
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO
> ozone.MiniOzoneHAClusterImpl
> (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC
> server at localhost/127.0.0.1:10810
> 2020-04-11 21:52:08,485
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO
> ozone.MiniOzoneHAClusterImpl
> (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC
> server at localhost/127.0.0.1:10816
> 2020-04-11 21:52:22,845
> [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO
> failure.Failures (FailureManager.java:start(66)) - starting failure manager
> 60 60 SECONDS
> 2020-04-11 21:53:22,850 [pool-59-thread-1] INFO failure.Failures
> (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
> 2020-04-11 21:53:22,853 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl
> (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down
> OzoneManager omNode-3
> 2020-04-11 21:53:22,988 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl
> (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting
> OzoneManager omNode-3
> at
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
> at
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
> at
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
> at
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
> 2020-04-11 21:54:22,849 [pool-59-thread-1] INFO failure.Failures
> (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
> 2020-04-11 21:54:22,850 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl
> (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down
> OzoneManager omNode-1
> 2020-04-11 21:54:22,895 [pool-59-thread-1] INFO ozone.MiniOzoneHAClusterImpl
> (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting
> OzoneManager omNode-1
> at
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
> at
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
> at
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
> at
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
> ➜ chaos-2020-04-11-21-51-52-IST
> {code}
> This results in the following exception.
> {code}
> 2020-04-11 21:54:24,201 [pool-360-thread-4] ERROR
> loadgenerators.LoadExecutors (LoadExecutors.java:load(67)) -
> FilesystemLoadGenerator LOADGEN: Exiting due to exception
> java.io.IOException: java.io.IOException: Could not determine or connect to
> OM Leader.
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:199)
> at
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
> at
> org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.doPostOp(LoadBucket.java:176)
> at
> org.apache.hadoop.ozone.utils.LoadBucket$Op.execute(LoadBucket.java:132)
> at
> org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.execute(LoadBucket.java:153)
> at
> org.apache.hadoop.ozone.utils.LoadBucket.writeKey(LoadBucket.java:76)
> at
> org.apache.hadoop.ozone.loadgenerators.FilesystemLoadGenerator.generateLoad(FilesystemLoadGenerator.java:47)
> at
> org.apache.hadoop.ozone.loadgenerators.LoadExecutors.load(LoadExecutors.java:65)
> at
> org.apache.hadoop.ozone.loadgenerators.LoadExecutors.lambda$startLoad$0(LoadExecutors.java:89)
> at
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Could not determine or connect to OM Leader.
> at
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:429)
> at
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:843)
> at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:71)
> at com.sun.proxy.$Proxy65.allocateBlock(Unknown Source)
> at
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:281)
> at
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:327)
> at
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]