[ 
https://issues.apache.org/jira/browse/HDDS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878599#comment-17878599
 ] 

JiangHua Zhu commented on HDDS-11396:
-------------------------------------

Here is the simulation of what happened.
I use Thread.sleep() here to simulate the high load of the machine.
The code is as follows:

{code:java}
    writeChannel.start();
    readChannel.start();
    hddsDispatcher.init();
    try {
      Thread.sleep(1800000);
    } catch (InterruptedException e) {
      throw new RuntimeException(e);
    }
    hddsDispatcher.setClusterId(clusterId);
    blockDeletingService.start();
{code}

Test steps:
1. SCM configuration:
{code:java}
<property>
     <name>ozone.scm.stale.node.interval</name>
     <value>60m</value>
   </property>
{code}

2. Datanode stop;
3. Datanode restart;
4. Client writes a new file.
We can get the same exception in Datanode. Log:
{code:java}
2024-09-02 22:36:10,776 [035bd2b7-37c6-4ad0-b57d-201b4d7eeff9-ChunkWriter-11-0] 
WARN org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation: 
CreateContainer , Trace ID:  , Message: java.lang.NullPointerException: 
clusterId cannot be null , Result: CONTAINER_INTERNAL_ERROR , 
StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
java.lang.NullPointerException: clusterId cannot be null
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:234)
        at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:506)
        at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:302)
        at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:197)
        at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
--
        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$writeStateMachineData$3(ContainerStateMachine.java:559)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException: clusterId cannot be null
        at 
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:921)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:149)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:380)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:248)
        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:231)
{code}



> NPE in Handler#clusterId
> ------------------------
>
>                 Key: HDDS-11396
>                 URL: https://issues.apache.org/jira/browse/HDDS-11396
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: DN
>    Affects Versions: 1.4.0
>            Reporter: JiangHua Zhu
>            Assignee: JiangHua Zhu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2024-08-31-17-26-22-105.png, 
> image-2024-09-02-22-53-22-487.png, screenshot-1.png
>
>
> When KeyValueHandler executes handleCreateContainer, it shows that 
> Handler#clusterId is null.
> Here are some logs:
> {code:java}
> 2024-08-31 13:29:30,924 
> [1134f6e4-49f6-4831-ad13-6bdb8ea23409-ContainerOp-113adaf2-c5da-479c-b677-5ec11ac5d97a-2]
>  WARN org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation: 
> CreateContainer , Trace ID:  , Message: java.lang.NullPointerException: 
> clusterId cannot be null , Result: CONTAINER_INTERNAL_ERROR , 
> StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.lang.NullPointerException: clusterId cannot be null
>       at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:225)
>       at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:469)
>       at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:275)
>       at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:179)
>       at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
>       at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>       at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:485)
>       at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$9(ContainerStateMachine.java:900)
>       at org.apache.ratis.util.TaskQueue.lambda$submit$0(TaskQueue.java:121)
>       at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
>       at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:78)
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
>       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>       at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: java.lang.NullPointerException: clusterId cannot be null
>       at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:921)
>       at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:148)
>       at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:367)
>       at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:239)
>       at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:222)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to