[
https://issues.apache.org/jira/browse/RATIS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Song Ziyang updated RATIS-1790:
-------------------------------
Description:
h2. 1. Issues related to gRPC logAppender
1. *(100% reproduce)* gRPC appender will timeout and fail when installing a
large snapshot to follower, as previously reported in
https://issues.apache.org/jira/browse/RATIS-1782.
2. *(small probability)* Storms of inconsistent RPCs bouncing between leader
and followers, as previously reported in
https://issues.apache.org/jira/browse/RATIS-1674.
h2. 2. Cause of these issues
Current *+deadline+* configuration of gRPC bidirectional streaming leads to the
issues above.
h2. 3. Dive into gRPC logAppender
gRPC logAppender will generate the stub with a deadline at the beginning of
installSnapshot, as in
[[1]|https://github.com/apache/ratis/blob/655db36c68a4c46a59150b548e76f2e92c33bf84/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java#L597-L600]
and
[[2]|https://github.com/apache/ratis/blob/master/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolClient.java#L140-L141].
{code:java}
snapshotRequestObserver = getClient().installSnapshot(responseHandler);
for (InstallSnapshotRequestProto request :
newInstallSnapshotRequests(requestId, snapshot)) {
snapshotRequestObserver.onNext(request);
}
{code}
Notice that the deadline is set for the whole observer, not for each streaming
message. Deadline is a fixed time point in future, every streaming messages
should complete before this time point, otherwise will be cancelled and onError
will be invoked.
I guess the original implementor of installSnapshot treats +deadline+ the same
as {+}timeout{+}, which they are not (check [https://grpc.io/blog/deadlines/]
for their difference). Therefore, every streaming messages will not have a
independent timeout of 3s (which we want), but rather share the same deadline
of (initial_time + 3s). When snapshot is large, the RPCs ordered lately will
become timeout and fail. This is the cause for 1st issue I mentioned above.
Also, gRPC implementors does not recommend to use deadline in a streaming stub
(see [https://github.com/grpc/grpc-java/issues/5498#issuecomment-476299936])
AppendEntries is not affected by this deadline problem since it does not assign
a deadline to stub
[[3]|https://github.com/apache/ratis/blob/655db36c68a4c46a59150b548e76f2e92c33bf84/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolClient.java#L134].
However, not using a deadline also causes some unexpected behaviors, as
mentioned in 2nd issue. Every RPC submitted to appendEntries observer will
never timeout and will +guaranteed to be delivered+ to the follower. There are
max 128 pending requests in gRPC's sending queue. Consider, if the first
pending-RPC receives an inconsistent reply, we shall cancel every other RPCs in
this sending queue. However, these submitted RPCs are un-cancellable without a
deadline, all we can do is to see them being sent, helplessly. These sent
requests can cause inconsistency storms, refer to RATIS-1674.
> Improve gRPC LogAppender's timeout mechanism
> --------------------------------------------
>
> Key: RATIS-1790
> URL: https://issues.apache.org/jira/browse/RATIS-1790
> Project: Ratis
> Issue Type: Improvement
> Components: gRPC, snapshot
> Affects Versions: 2.4.1
> Reporter: Song Ziyang
> Priority: Critical
>
> h2. 1. Issues related to gRPC logAppender
> 1. *(100% reproduce)* gRPC appender will timeout and fail when installing a
> large snapshot to follower, as previously reported in
> https://issues.apache.org/jira/browse/RATIS-1782.
> 2. *(small probability)* Storms of inconsistent RPCs bouncing between leader
> and followers, as previously reported in
> https://issues.apache.org/jira/browse/RATIS-1674.
> h2. 2. Cause of these issues
> Current *+deadline+* configuration of gRPC bidirectional streaming leads to
> the issues above.
> h2. 3. Dive into gRPC logAppender
> gRPC logAppender will generate the stub with a deadline at the beginning of
> installSnapshot, as in
> [[1]|https://github.com/apache/ratis/blob/655db36c68a4c46a59150b548e76f2e92c33bf84/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java#L597-L600]
> and
> [[2]|https://github.com/apache/ratis/blob/master/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolClient.java#L140-L141].
> {code:java}
> snapshotRequestObserver = getClient().installSnapshot(responseHandler);
> for (InstallSnapshotRequestProto request :
> newInstallSnapshotRequests(requestId, snapshot)) {
> snapshotRequestObserver.onNext(request);
> }
> {code}
> Notice that the deadline is set for the whole observer, not for each
> streaming message. Deadline is a fixed time point in future, every streaming
> messages should complete before this time point, otherwise will be cancelled
> and onError will be invoked.
> I guess the original implementor of installSnapshot treats +deadline+ the
> same as {+}timeout{+}, which they are not (check
> [https://grpc.io/blog/deadlines/] for their difference). Therefore, every
> streaming messages will not have a independent timeout of 3s (which we want),
> but rather share the same deadline of (initial_time + 3s). When snapshot is
> large, the RPCs ordered lately will become timeout and fail. This is the
> cause for 1st issue I mentioned above. Also, gRPC implementors does not
> recommend to use deadline in a streaming stub (see
> [https://github.com/grpc/grpc-java/issues/5498#issuecomment-476299936])
> AppendEntries is not affected by this deadline problem since it does not
> assign a deadline to stub
> [[3]|https://github.com/apache/ratis/blob/655db36c68a4c46a59150b548e76f2e92c33bf84/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolClient.java#L134].
> However, not using a deadline also causes some unexpected behaviors, as
> mentioned in 2nd issue. Every RPC submitted to appendEntries observer will
> never timeout and will +guaranteed to be delivered+ to the follower. There
> are max 128 pending requests in gRPC's sending queue. Consider, if the first
> pending-RPC receives an inconsistent reply, we shall cancel every other RPCs
> in this sending queue. However, these submitted RPCs are un-cancellable
> without a deadline, all we can do is to see them being sent, helplessly.
> These sent requests can cause inconsistency storms, refer to RATIS-1674.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)