[ 
https://issues.apache.org/jira/browse/HDDS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Rose updated HDDS-830:
----------------------------
    Target Version/s: 1.3.0  (was: 1.2.0)

I am managing the 1.2.0 release and we currently have more than 600 issues 
targeted for 1.2.0. I am moving the target field to 1.3.0.

If you are actively working on this jira and believe this should be targeted 
for the 1.2.0 release, Please reach out to me via Apache email or Slack.

> Datanode should not start XceiverServerRatis before getting version 
> information from SCM
> ----------------------------------------------------------------------------------------
>
>                 Key: HDDS-830
>                 URL: https://issues.apache.org/jira/browse/HDDS-830
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 0.3.0
>            Reporter: Nandakumar
>            Priority: Major
>              Labels: TriagePending
>
> If a datanode restarts quickly before SCM detects, it will rejoin the ratis 
> ring (existing pipeline). Since SCM didn't detect this restart, the pipeline 
> is not closed. Now there is a time gap after the datanode is started and it 
> got the version information from SCM. During this time, the SCM ID in 
> datanode is not set(null). If a client tries to use this pipeline during that 
> time, the container state machine will throw 
> {{java.lang.NullPointerException: scmId cannot be nul}}. This will cause 
> {{RaftLogWorker}} to terminate resulting in datanode crash.
> {code}
> 2018-11-12 19:45:31,811 ERROR storage.RaftLogWorker 
> (ExitUtils.java:terminate(86)) - Terminating with exit status 1: 
> 407fd181-2ff7-4651-9a47-a0927ede4c51-RaftLogWorker failed.
> java.io.IOException: java.lang.NullPointerException: scmId cannot be null
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:83)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker$StateMachineDataPolicy.getFromFuture(RaftLogWorker.java:76)
>   at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:344)
>   at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:216)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException: scmId cannot be null
>   at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:106)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:242)
>   at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:165)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:206)
>   at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:124)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:274)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:280)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:301)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   ... 1 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to