[
https://issues.apache.org/jira/browse/HDDS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prashant Pogde updated HDDS-830:
--------------------------------
Target Version/s: 1.2.0
I am managing the 1.1.0 release and we currently have more than 600 issues
targeted for 1.1.0. I am moving the target field to 1.2.0.
If you are actively working on this jira and believe this should be targeted to
1.1.0 release, Please change the target field back to 1.1.0 before Feb 05,
2021.
> Datanode should not start XceiverServerRatis before getting version
> information from SCM
> ----------------------------------------------------------------------------------------
>
> Key: HDDS-830
> URL: https://issues.apache.org/jira/browse/HDDS-830
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Datanode
> Affects Versions: 0.3.0
> Reporter: Nanda kumar
> Priority: Major
> Labels: TriagePending
>
> If a datanode restarts quickly before SCM detects, it will rejoin the ratis
> ring (existing pipeline). Since SCM didn't detect this restart, the pipeline
> is not closed. Now there is a time gap after the datanode is started and it
> got the version information from SCM. During this time, the SCM ID in
> datanode is not set(null). If a client tries to use this pipeline during that
> time, the container state machine will throw
> {{java.lang.NullPointerException: scmId cannot be nul}}. This will cause
> {{RaftLogWorker}} to terminate resulting in datanode crash.
> {code}
> 2018-11-12 19:45:31,811 ERROR storage.RaftLogWorker
> (ExitUtils.java:terminate(86)) - Terminating with exit status 1:
> 407fd181-2ff7-4651-9a47-a0927ede4c51-RaftLogWorker failed.
> java.io.IOException: java.lang.NullPointerException: scmId cannot be null
> at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
> at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
> at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:83)
> at
> org.apache.ratis.server.storage.RaftLogWorker$StateMachineDataPolicy.getFromFuture(RaftLogWorker.java:76)
> at
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:344)
> at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:216)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException: scmId cannot be null
> at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:106)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:242)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:165)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:206)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:124)
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:274)
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:280)
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:301)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ... 1 more
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]