[ 
https://issues.apache.org/jira/browse/HDDS-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842519#comment-17842519
 ] 

Duong edited comment on HDDS-10780 at 4/30/24 11:53 PM:
--------------------------------------------------------

In my case, the datanode ID seems to be removed from the 
XceiverClientRatis.updateCommitInfosMap by exception handling (there was a 
TimedOut error waiting for the commitWatch). Then, subsequent watchForCommit 
seems to RaftClientReply but then fails to updateCommitInfosMap because the 
datanode is already removed.
{code:java}
@Override
public XceiverClientReply watchForCommit(long index)
    throws InterruptedException, ExecutionException, TimeoutException,
    IOException {
  ...
  try {
    CompletableFuture<RaftClientReply> replyFuture = getClient().async()
        .watch(index, RaftProtos.ReplicationLevel.ALL_COMMITTED);
    final RaftClientReply reply = replyFuture.get();
    final long updated = updateCommitInfosMap(reply);
    
  } catch (Exception e) {
    ....
    reply.getCommitInfos().stream()
        .filter(i -> i.getCommitIndex() < index)
        .forEach(proto -> {
          UUID address = RatisHelper.toDatanodeId(proto.getServer());
          addDatanodetoReply(address, clientReply);
          // since 3 way commit has failed, the updated map from now on  will
          // only store entries for those datanodes which have had successful
          // replication.
          commitInfoMap.remove(address);
          LOG.info(
              "Could not commit index {} on pipeline {} to all the nodes. " +
              "Server {} has failed. Committed by majority.",
              index, pipeline, address);
        });
    return clientReply; {code}
{code:java}
java.util.concurrent.ExecutionException: 
org.apache.ratis.protocol.exceptions.TimeoutIOException: 
client-60DB96616032->641319db-0e84-49e2-a93f-2b8c4ea8d744 request #12190 
timeout 180s
        at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
        at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
        at 
org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:279)
        at 
org.apache.hadoop.hdds.scm.storage.AbstractCommitWatcher.watchForCommit(AbstractCommitWatcher.java:142)
        at 
org.apache.hadoop.hdds.scm.storage.AbstractCommitWatcher.watchOnFirstIndex(AbstractCommitWatcher.java:104)
        at 
org.apache.hadoop.hdds.scm.storage.RatisBlockOutputStream.sendWatchForCommit(RatisBlockOutputStream.java:106)
        at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:418)
        at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitForFlushAndCommit(BlockOutputStream.java:389)
        at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFullBuffer(BlockOutputStream.java:376)
        at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.doFlushOrWatchIfNeeded(BlockOutputStream.java:310)
        at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:294)
        at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:137)
        at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.writeToOutputStream(KeyOutputStream.java:250)
        at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:228)
        at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:208)
        at 
org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:94)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1484)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:1107)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1456)
        at 
org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.put(ObjectEndpoint.java:328)
        at jdk.internal.reflect.GeneratedMethodAccessor170.invoke(Unknown 
Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 {code}


was (Author: JIRAUSER290990):
In my case, the datanode ID seems to be removed from the 
XceiverClientRatis.updateCommitInfosMap by exception handling (there was a 
TimedOut error waiting for the commitWatch. Then, subsequent watchForCommit 
seems to RaftClientReply but then fails to updateCommitInfosMap because the 
datanode is already removed.
{code:java}
@Override
public XceiverClientReply watchForCommit(long index)
    throws InterruptedException, ExecutionException, TimeoutException,
    IOException {
  ...
  try {
    CompletableFuture<RaftClientReply> replyFuture = getClient().async()
        .watch(index, RaftProtos.ReplicationLevel.ALL_COMMITTED);
    final RaftClientReply reply = replyFuture.get();
    final long updated = updateCommitInfosMap(reply);
    
  } catch (Exception e) {
    ....
    reply.getCommitInfos().stream()
        .filter(i -> i.getCommitIndex() < index)
        .forEach(proto -> {
          UUID address = RatisHelper.toDatanodeId(proto.getServer());
          addDatanodetoReply(address, clientReply);
          // since 3 way commit has failed, the updated map from now on  will
          // only store entries for those datanodes which have had successful
          // replication.
          commitInfoMap.remove(address);
          LOG.info(
              "Could not commit index {} on pipeline {} to all the nodes. " +
              "Server {} has failed. Committed by majority.",
              index, pipeline, address);
        });
    return clientReply; {code}
{code:java}
java.util.concurrent.ExecutionException: 
org.apache.ratis.protocol.exceptions.TimeoutIOException: 
client-60DB96616032->641319db-0e84-49e2-a93f-2b8c4ea8d744 request #12190 
timeout 180s
        at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
        at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
        at 
org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:279)
        at 
org.apache.hadoop.hdds.scm.storage.AbstractCommitWatcher.watchForCommit(AbstractCommitWatcher.java:142)
        at 
org.apache.hadoop.hdds.scm.storage.AbstractCommitWatcher.watchOnFirstIndex(AbstractCommitWatcher.java:104)
        at 
org.apache.hadoop.hdds.scm.storage.RatisBlockOutputStream.sendWatchForCommit(RatisBlockOutputStream.java:106)
        at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:418)
        at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitForFlushAndCommit(BlockOutputStream.java:389)
        at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFullBuffer(BlockOutputStream.java:376)
        at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.doFlushOrWatchIfNeeded(BlockOutputStream.java:310)
        at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:294)
        at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:137)
        at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.writeToOutputStream(KeyOutputStream.java:250)
        at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:228)
        at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:208)
        at 
org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:94)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1484)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:1107)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1456)
        at 
org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.put(ObjectEndpoint.java:328)
        at jdk.internal.reflect.GeneratedMethodAccessor170.invoke(Unknown 
Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 {code}

> NullPointerException in watchForCommit
> --------------------------------------
>
>                 Key: HDDS-10780
>                 URL: https://issues.apache.org/jira/browse/HDDS-10780
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Duong
>            Priority: Major
>
> NPE happens during watchForCommit. In updateCommitInfosMap, when there's a 
> new Datanode ID appears in the commitInfoProtos, we'll see a NPE.
> {code:java}
> java.lang.NullPointerException
>         at 
> java.base/java.util.stream.ReferencePipeline$5$1.accept(ReferencePipeline.java:229)
>         at 
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
>         at 
> java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
>         at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>         at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
>         at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
>         at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>         at 
> java.base/java.util.stream.LongPipeline.reduce(LongPipeline.java:479)
>         at java.base/java.util.stream.LongPipeline.min(LongPipeline.java:437)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.updateCommitInfosMap(XceiverClientRatis.java:154)
>         at java.base/java.util.Optional.map(Optional.java:265)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.updateCommitInfosMap(XceiverClientRatis.java:133)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:280)
>         at 
> org.apache.hadoop.hdds.scm.storage.AbstractCommitWatcher.watchForCommit(AbstractCommitWatcher.java:142)
>         at 
> org.apache.hadoop.hdds.scm.storage.AbstractCommitWatcher.watchOnFirstIndex(AbstractCommitWatcher.java:104)
>         at 
> org.apache.hadoop.hdds.scm.storage.RatisBlockOutputStream.sendWatchForCommit(RatisBlockOutputStream.java:106)
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:418)
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitForFlushAndCommit(BlockOutputStream.java:389)
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFullBuffer(BlockOutputStream.java:376)
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.doFlushOrWatchIfNeeded(BlockOutputStream.java:310)
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:294)
>         at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:137)
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.writeToOutputStream(KeyOutputStream.java:250)
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:228)
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:208)
>         at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:94)
>         at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1484)
>         at org.apache.commons.io.IOUtils.copy(IOUtils.java:1107)
>         at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1456)
>         at 
> org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.put(ObjectEndpoint.java:328)
>         at jdk.internal.reflect.GeneratedMethodAccessor82.invoke(Unknown 
> Source)
>         at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>         at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
>         at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)
>         at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)
>         at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
>         at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
>         at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
>         at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
>         at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
>         at 
> org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256)
>         at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
>         at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
>         at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
>         at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
>         at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
>         at 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
>         at 
> org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235)
>         at 
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)
>         at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
>         at 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
>         at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:359)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to