[
https://issues.apache.org/jira/browse/HDDS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878599#comment-17878599
]
JiangHua Zhu commented on HDDS-11396:
-------------------------------------
Here is the simulation of what happened.
I use Thread.sleep() here to simulate the high load of the machine.
The code is as follows:
{code:java}
writeChannel.start();
readChannel.start();
hddsDispatcher.init();
try {
Thread.sleep(1800000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
hddsDispatcher.setClusterId(clusterId);
blockDeletingService.start();
{code}
Test steps:
1. SCM configuration:
{code:java}
<property>
<name>ozone.scm.stale.node.interval</name>
<value>60m</value>
</property>
{code}
2. Datanode stop;
3. Datanode restart;
4. Client writes a new file.
We can get the same exception in Datanode. Log:
{code:java}
2024-09-02 22:36:10,776 [035bd2b7-37c6-4ad0-b57d-201b4d7eeff9-ChunkWriter-11-0]
WARN org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation:
CreateContainer , Trace ID: , Message: java.lang.NullPointerException:
clusterId cannot be null , Result: CONTAINER_INTERNAL_ERROR ,
StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
java.lang.NullPointerException: clusterId cannot be null
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:234)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:506)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:302)
at
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:197)
at
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
--
at
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$writeStateMachineData$3(ContainerStateMachine.java:559)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException: clusterId cannot be null
at
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:921)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:149)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:380)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:248)
at
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:231)
{code}
> NPE in Handler#clusterId
> ------------------------
>
> Key: HDDS-11396
> URL: https://issues.apache.org/jira/browse/HDDS-11396
> Project: Apache Ozone
> Issue Type: Bug
> Components: DN
> Affects Versions: 1.4.0
> Reporter: JiangHua Zhu
> Assignee: JiangHua Zhu
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2024-08-31-17-26-22-105.png,
> image-2024-09-02-22-53-22-487.png, screenshot-1.png
>
>
> When KeyValueHandler executes handleCreateContainer, it shows that
> Handler#clusterId is null.
> Here are some logs:
> {code:java}
> 2024-08-31 13:29:30,924
> [1134f6e4-49f6-4831-ad13-6bdb8ea23409-ContainerOp-113adaf2-c5da-479c-b677-5ec11ac5d97a-2]
> WARN org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation:
> CreateContainer , Trace ID: , Message: java.lang.NullPointerException:
> clusterId cannot be null , Result: CONTAINER_INTERNAL_ERROR ,
> StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
> java.lang.NullPointerException: clusterId cannot be null
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:225)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:469)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:275)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:179)
> at
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
> at
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:485)
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$9(ContainerStateMachine.java:900)
> at org.apache.ratis.util.TaskQueue.lambda$submit$0(TaskQueue.java:121)
> at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
> at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:78)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: java.lang.NullPointerException: clusterId cannot be null
> at
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:921)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:148)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:367)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:239)
> at
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:222)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]