[ 
https://issues.apache.org/jira/browse/HDDS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDDS-6093.
-------------------------------------
    Resolution: Duplicate

> Improve error handling if a container not found during replication
> ------------------------------------------------------------------
>
>                 Key: HDDS-6093
>                 URL: https://issues.apache.org/jira/browse/HDDS-6093
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Datanode
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> When a datanode receives a request to download / copy a container, if the 
> container does not exist in the ContainerMap on the datanode the caller does 
> not get a useful error message. For example, the caller gets a stack trace 
> like:
> {code}
> 2021-12-08 12:46:50,537 ERROR 
> org.apache.hadoop.ozone.container.replication.GrpcReplicationClient: Download 
> of container 10009 was unsuccessful
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNKNOWN
>       at 
> org.apache.ratis.thirdparty.io.grpc.Status.asRuntimeException(Status.java:533)
>       at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:453)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:689)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900(ClientCallImpl.java:577)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:751)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:740)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> To make things worse, on the source datanode, the role log does not get 
> anything, and instead we get this stack trace in the stderr output:
> {code}
> Dec 08, 2021 12:46:50 PM 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor run
> SEVERE: Exception while executing runnable 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@62026ae8
> java.lang.NullPointerException: Container is not found 10009
>       at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:897)
>       at 
> org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(OnDemandContainerReplicationSource.java:56)
>       at 
> org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(GrpcReplicationService.java:56)
>       at 
> org.apache.hadoop.hdds.protocol.datanode.proto.IntraDatanodeProtocolServiceGrpc$MethodHandlers.invoke(IntraDatanodeProtocolServiceGrpc.java:219)
>       at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
>       at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
>       at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
>       at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:818)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason, is that a NullPointerException is thrown in 
> OnDemandContainerReplicationSource and this is not caught by the caller, 
> causing the exception to bubble up to Thread.run(), where it lands in stderr.
> The solution is to explicity handle the null container and throw an 
> IOException which will be handed and set the response status correctly:
> {code}
>   public void download(CopyContainerRequestProto request,
>       StreamObserver<CopyContainerResponseProto> responseObserver) {
>     long containerID = request.getContainerID();
>     LOG.info("Streaming container data ({}) to other datanode", containerID);
>     try {
>       GrpcOutputStream outputStream =
>           new GrpcOutputStream(responseObserver, containerID, BUFFER_SIZE);
>       source.copyData(containerID, outputStream);
>     } catch (IOException e) {
>       LOG.error("Error streaming container {}", containerID, e);
>       responseObserver.onError(e);
>     }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to