[
https://issues.apache.org/jira/browse/RATIS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Song Ziyang resolved RATIS-1802.
--------------------------------
Fix Version/s: 3.0.0
Resolution: Fixed
The PR is now merged. Thanks a lot for [~adoroszlai] and [~szetszwo]!
> GrpcServerProtocolService encounters IllegalStateException: call already
> closed
> -------------------------------------------------------------------------------
>
> Key: RATIS-1802
> URL: https://issues.apache.org/jira/browse/RATIS-1802
> Project: Ratis
> Issue Type: Bug
> Components: gRPC
> Reporter: Attila Doroszlai
> Assignee: Tsz-wo Sze
> Priority: Major
> Fix For: 3.0.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> [~szetszwo], I think I've found similar problem as HDDS-8019 in Ratis:
> {code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2022/10/18/17796/it-flaky/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAMetadataOnly-output.txt}
> 2022-10-18 03:56:31,290 [grpc-default-executor-3] WARN
> server.GrpcServerProtocolService (LogUtils.java:warn(122)) - omNode-1: Failed
> INSTALL_SNAPSHOT request omNode-2->omNode-1#0-t6,notify:(t:4, i:207):
> org.apache.ratis.protocol.exceptions.ServerNotReadyException:
> omNode-1@group-523986131536 is not in [STARTING, RUNNING]: current state is
> CLOSING
> 2022-10-18 03:56:31,291 [grpc-default-executor-3] INFO
> server.GrpcServerProtocolService
> (GrpcServerProtocolService.java:onCompleted(140)) - omNode-1: Completed
> INSTALL_SNAPSHOT, lastRequest: omNode-2->omNode-1#0-t6,notify:(t:4, i:207)
> Oct 18, 2022 3:56:31 AM
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor run
> SEVERE: Exception while executing runnable
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@5891590d
> java.lang.IllegalStateException: call already closed
> at
> org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:502)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl.closeInternal(ServerCallImpl.java:214)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl.close(ServerCallImpl.java:207)
> at
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingServerCall.close(PartialForwardingServerCall.java:48)
> at
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCall.close(ForwardingServerCall.java:22)
> at
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCall$SimpleForwardingServerCall.close(ForwardingServerCall.java:39)
> at
> org.apache.ratis.grpc.metrics.intercept.server.MetricServerCall.close(MetricServerCall.java:46)
> at
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:395)
> at
> org.apache.ratis.grpc.server.GrpcServerProtocolService$ServerRequestStreamObserver.onCompleted(GrpcServerProtocolService.java:141)
> at
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onHalfClose(ServerCalls.java:273)
> at
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
> at
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:340)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> {code}
> I think no further calls to {{responseObserver}} should be allowed after this:
> {code:title=https://github.com/apache/ratis/blob/212690e215f24eb5d48eff138ec24f3abe8da508/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolService.java#L93-L96}
> private void handleError(Throwable e, REQUEST request) {
> GrpcUtil.warn(LOG, () -> getId() + ": Failed " + op + " request " +
> requestToString(request), e);
> responseObserver.onError(wrapException(e, request));
> }
> {code}
> But {{onCompleted}} or {{onError}} may still call
> {{responseObserver.onCompleted()}}:
> {code:title=https://github.com/apache/ratis/blob/212690e215f24eb5d48eff138ec24f3abe8da508/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolService.java#L140-L156}
> @Override
> public void onCompleted() {
> if (isClosed.compareAndSet(false, true)) {
> LOG.info("{}: Completed {}, lastRequest: {}", getId(), op,
> getPreviousRequestString());
> responseObserver.onCompleted();
> }
> }
> @Override
> public void onError(Throwable t) {
> GrpcUtil.warn(LOG, () -> getId() + ": installSnapshot onError,
> lastRequest: " + getPreviousRequestString(), t);
> if (isClosed.compareAndSet(false, true)) {
> Status status = Status.fromThrowable(t);
> if (status != null && status.getCode() != Status.Code.CANCELLED) {
> responseObserver.onCompleted();
> }
> }
> }
> {code}
> I think {{handleError}} should set the {{isClosed}} flag to prevent that.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)