[ 
https://issues.apache.org/jira/browse/RATIS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Song Ziyang resolved RATIS-1802.
--------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

The PR is now merged. Thanks a lot for [~adoroszlai] and [~szetszwo]!

> GrpcServerProtocolService encounters IllegalStateException: call already 
> closed
> -------------------------------------------------------------------------------
>
>                 Key: RATIS-1802
>                 URL: https://issues.apache.org/jira/browse/RATIS-1802
>             Project: Ratis
>          Issue Type: Bug
>          Components: gRPC
>            Reporter: Attila Doroszlai
>            Assignee: Tsz-wo Sze
>            Priority: Major
>             Fix For: 3.0.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [~szetszwo], I think I've found similar problem as HDDS-8019 in Ratis:
> {code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2022/10/18/17796/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAMetadataOnly-output.txt}
> 2022-10-18 03:56:31,290 [grpc-default-executor-3] WARN  
> server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-1: Failed 
> INSTALL_SNAPSHOT request omNode-2->omNode-1#0-t6,notify:(t:4, i:207): 
> org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
> omNode-1@group-523986131536 is not in [STARTING, RUNNING]: current state is 
> CLOSING
> 2022-10-18 03:56:31,291 [grpc-default-executor-3] INFO  
> server.GrpcServerProtocolService 
> (GrpcServerProtocolService.java:onCompleted(140)) - omNode-1: Completed 
> INSTALL_SNAPSHOT, lastRequest: omNode-2->omNode-1#0-t6,notify:(t:4, i:207)
> Oct 18, 2022 3:56:31 AM 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor run
> SEVERE: Exception while executing runnable 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@5891590d
> java.lang.IllegalStateException: call already closed
>       at 
> org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:502)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl.closeInternal(ServerCallImpl.java:214)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl.close(ServerCallImpl.java:207)
>       at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingServerCall.close(PartialForwardingServerCall.java:48)
>       at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCall.close(ForwardingServerCall.java:22)
>       at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCall$SimpleForwardingServerCall.close(ForwardingServerCall.java:39)
>       at 
> org.apache.ratis.grpc.metrics.intercept.server.MetricServerCall.close(MetricServerCall.java:46)
>       at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:395)
>       at 
> org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onCompleted(GrpcServerProtocolService.java:141)
>       at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onHalfClose(ServerCalls.java:273)
>       at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
>       at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:340)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>       at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:750)
> {code}
> I think no further calls to {{responseObserver}} should be allowed after this:
> {code:title=https://github.com/apache/ratis/blob/212690e215f24eb5d48eff138ec24f3abe8da508/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolService.java#L93-L96}
>     private void handleError(Throwable e, REQUEST request) {
>       GrpcUtil.warn(LOG, () -> getId() + ": Failed " + op + " request " + 
> requestToString(request), e);
>       responseObserver.onError(wrapException(e, request));
>     }
> {code}
> But {{onCompleted}} or {{onError}} may still call 
> {{responseObserver.onCompleted()}}:
> {code:title=https://github.com/apache/ratis/blob/212690e215f24eb5d48eff138ec24f3abe8da508/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolService.java#L140-L156}
>     @Override
>     public void onCompleted() {
>       if (isClosed.compareAndSet(false, true)) {
>         LOG.info("{}: Completed {}, lastRequest: {}", getId(), op, 
> getPreviousRequestString());
>         responseObserver.onCompleted();
>       }
>     }
>     @Override
>     public void onError(Throwable t) {
>       GrpcUtil.warn(LOG, () -> getId() + ": installSnapshot onError, 
> lastRequest: " + getPreviousRequestString(), t);
>       if (isClosed.compareAndSet(false, true)) {
>         Status status = Status.fromThrowable(t);
>         if (status != null && status.getCode() != Status.Code.CANCELLED) {
>           responseObserver.onCompleted();
>         }
>       }
>     }
> {code}
> I think {{handleError}} should set the {{isClosed}} flag to prevent that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to