Attila Doroszlai created RATIS-1802:
---------------------------------------
Summary: GrpcServerProtocolService encounters
IllegalStateException: call already closed
Key: RATIS-1802
URL: https://issues.apache.org/jira/browse/RATIS-1802
Project: Ratis
Issue Type: Bug
Components: gRPC
Reporter: Attila Doroszlai
[~szetszwo], I think I've found similar problem as HDDS-8019 in Ratis:
{code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2022/10/18/17796/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAMetadataOnly-output.txt}
2022-10-18 03:56:31,290 [grpc-default-executor-3] WARN
server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-1: Failed
INSTALL_SNAPSHOT request omNode-2->omNode-1#0-t6,notify:(t:4, i:207):
org.apache.ratis.protocol.exceptions.ServerNotReadyException:
omNode-1@group-523986131536 is not in [STARTING, RUNNING]: current state is
CLOSING
2022-10-18 03:56:31,291 [grpc-default-executor-3] INFO
server.GrpcServerProtocolService
(GrpcServerProtocolService.java:onCompleted(140)) - omNode-1: Completed
INSTALL_SNAPSHOT, lastRequest: omNode-2->omNode-1#0-t6,notify:(t:4, i:207)
Oct 18, 2022 3:56:31 AM
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor run
SEVERE: Exception while executing runnable
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@5891590d
java.lang.IllegalStateException: call already closed
at
org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:502)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl.closeInternal(ServerCallImpl.java:214)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl.close(ServerCallImpl.java:207)
at
org.apache.ratis.thirdparty.io.grpc.PartialForwardingServerCall.close(PartialForwardingServerCall.java:48)
at
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCall.close(ForwardingServerCall.java:22)
at
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCall$SimpleForwardingServerCall.close(ForwardingServerCall.java:39)
at
org.apache.ratis.grpc.metrics.intercept.server.MetricServerCall.close(MetricServerCall.java:46)
at
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:395)
at
org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onCompleted(GrpcServerProtocolService.java:141)
at
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onHalfClose(ServerCalls.java:273)
at
org.apache.ratis.thirdparty.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:340)
at
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866)
at
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{code}
I think no further calls to {{responseObserver}} should be allowed after this:
{code:title=https://github.com/apache/ratis/blob/212690e215f24eb5d48eff138ec24f3abe8da508/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolService.java#L93-L96}
private void handleError(Throwable e, REQUEST request) {
GrpcUtil.warn(LOG, () -> getId() + ": Failed " + op + " request " +
requestToString(request), e);
responseObserver.onError(wrapException(e, request));
}
{code}
But {{onCompleted}} or {{onError}} may still call
{{responseObserver.onCompleted()}}:
{code:title=https://github.com/apache/ratis/blob/212690e215f24eb5d48eff138ec24f3abe8da508/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolService.java#L140-L156}
@Override
public void onCompleted() {
if (isClosed.compareAndSet(false, true)) {
LOG.info("{}: Completed {}, lastRequest: {}", getId(), op,
getPreviousRequestString());
responseObserver.onCompleted();
}
}
@Override
public void onError(Throwable t) {
GrpcUtil.warn(LOG, () -> getId() + ": installSnapshot onError,
lastRequest: " + getPreviousRequestString(), t);
if (isClosed.compareAndSet(false, true)) {
Status status = Status.fromThrowable(t);
if (status != null && status.getCode() != Status.Code.CANCELLED) {
responseObserver.onCompleted();
}
}
}
{code}
I think {{handleError}} should set the {{isClosed}} flag to prevent that.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)