Attila Doroszlai created RATIS-1802:
---------------------------------------

             Summary: GrpcServerProtocolService encounters 
IllegalStateException: call already closed
                 Key: RATIS-1802
                 URL: https://issues.apache.org/jira/browse/RATIS-1802
             Project: Ratis
          Issue Type: Bug
          Components: gRPC
            Reporter: Attila Doroszlai


[~szetszwo], I think I've found similar problem as HDDS-8019 in Ratis:

{code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2022/10/18/17796/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAMetadataOnly-output.txt}
2022-10-18 03:56:31,290 [grpc-default-executor-3] WARN  
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-1: Failed 
INSTALL_SNAPSHOT request omNode-2->omNode-1#0-t6,notify:(t:4, i:207): 
org.apache.ratis.protocol.exceptions.ServerNotReadyException: 
omNode-1@group-523986131536 is not in [STARTING, RUNNING]: current state is 
CLOSING
2022-10-18 03:56:31,291 [grpc-default-executor-3] INFO  
server.GrpcServerProtocolService 
(GrpcServerProtocolService.java:onCompleted(140)) - omNode-1: Completed 
INSTALL_SNAPSHOT, lastRequest: omNode-2->omNode-1#0-t6,notify:(t:4, i:207)
Oct 18, 2022 3:56:31 AM 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor run
SEVERE: Exception while executing runnable 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@5891590d
java.lang.IllegalStateException: call already closed
        at 
org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:502)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl.closeInternal(ServerCallImpl.java:214)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl.close(ServerCallImpl.java:207)
        at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingServerCall.close(PartialForwardingServerCall.java:48)
        at 
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCall.close(ForwardingServerCall.java:22)
        at 
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCall$SimpleForwardingServerCall.close(ForwardingServerCall.java:39)
        at 
org.apache.ratis.grpc.metrics.intercept.server.MetricServerCall.close(MetricServerCall.java:46)
        at 
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:395)
        at 
org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onCompleted(GrpcServerProtocolService.java:141)
        at 
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onHalfClose(ServerCalls.java:273)
        at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
        at 
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:340)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
{code}

I think no further calls to {{responseObserver}} should be allowed after this:

{code:title=https://github.com/apache/ratis/blob/212690e215f24eb5d48eff138ec24f3abe8da508/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolService.java#L93-L96}
    private void handleError(Throwable e, REQUEST request) {
      GrpcUtil.warn(LOG, () -> getId() + ": Failed " + op + " request " + 
requestToString(request), e);
      responseObserver.onError(wrapException(e, request));
    }
{code}

But {{onCompleted}} or {{onError}} may still call 
{{responseObserver.onCompleted()}}:

{code:title=https://github.com/apache/ratis/blob/212690e215f24eb5d48eff138ec24f3abe8da508/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolService.java#L140-L156}
    @Override
    public void onCompleted() {
      if (isClosed.compareAndSet(false, true)) {
        LOG.info("{}: Completed {}, lastRequest: {}", getId(), op, 
getPreviousRequestString());
        responseObserver.onCompleted();
      }
    }
    @Override
    public void onError(Throwable t) {
      GrpcUtil.warn(LOG, () -> getId() + ": installSnapshot onError, 
lastRequest: " + getPreviousRequestString(), t);
      if (isClosed.compareAndSet(false, true)) {
        Status status = Status.fromThrowable(t);
        if (status != null && status.getCode() != Status.Code.CANCELLED) {
          responseObserver.onCompleted();
        }
      }
    }
{code}

I think {{handleError}} should set the {{isClosed}} flag to prevent that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to