[ 
https://issues.apache.org/jira/browse/RATIS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Song Ziyang updated RATIS-1790:
-------------------------------
    Description: 
h2. 1. Issues related to gRPC logAppender

1. *(100% reproduce)* gRPC appender will timeout and fail when installing a 
large snapshot to follower, as previously reported in 
https://issues.apache.org/jira/browse/RATIS-1782.
2. *(small probability)* Storms of inconsistent RPCs bouncing between leader 
and followers, as previously reported in 
https://issues.apache.org/jira/browse/RATIS-1674.
h2. 2. Cause of these issues

Current *+deadline+* configuration of gRPC bidirectional streaming leads to the 
issues above.
h2. 3. Dive into gRPC logAppender

gRPC logAppender will generate the stub with a deadline at the beginning of 
installSnapshot, as in 
[[1]|https://github.com/apache/ratis/blob/655db36c68a4c46a59150b548e76f2e92c33bf84/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java#L597-L600]
 and 
[[2]|https://github.com/apache/ratis/blob/master/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolClient.java#L140-L141].
{code:java}
snapshotRequestObserver = getClient().installSnapshot(responseHandler);
for (InstallSnapshotRequestProto request : 
newInstallSnapshotRequests(requestId, snapshot)) {
    snapshotRequestObserver.onNext(request);
}
{code}
Notice that the deadline is set for the whole observer, not for each streaming 
message. Deadline is a fixed time point in future, every streaming messages 
should complete before this time point, otherwise will be cancelled and onError 
will be invoked. 
I guess the original implementor of installSnapshot treats +deadline+ the same 
as {+}timeout{+}, which they are not (check [https://grpc.io/blog/deadlines/] 
for their difference). Therefore, every streaming messages will not have a 
independent timeout of 3s (which we want), but rather share the same deadline 
of (initial_time + 3s). When snapshot is large, the RPCs ordered lately will 
become timeout and fail. This is the cause for 1st issue I mentioned above. 
Also, gRPC implementors does not recommend to use deadline in a streaming stub 
(see [https://github.com/grpc/grpc-java/issues/5498#issuecomment-476299936])

AppendEntries is not affected by this deadline problem since it does not assign 
a deadline to stub 
[[3]|https://github.com/apache/ratis/blob/655db36c68a4c46a59150b548e76f2e92c33bf84/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolClient.java#L134].
 However, not using a deadline also causes some unexpected behaviors, as 
mentioned in 2nd issue. Every RPC submitted to appendEntries observer will 
never timeout and will +guaranteed to be delivered+ to the follower. There are 
max 128 pending requests in gRPC's sending queue. Consider, if the first 
pending-RPC receives an inconsistent reply, we shall cancel every other RPCs in 
this sending queue. However, these submitted RPCs are un-cancellable without a 
deadline, all we can do is to see them being sent, helplessly. These sent 
requests can cause inconsistency storms, refer to RATIS-1674.


> Improve gRPC LogAppender's timeout mechanism
> --------------------------------------------
>
>                 Key: RATIS-1790
>                 URL: https://issues.apache.org/jira/browse/RATIS-1790
>             Project: Ratis
>          Issue Type: Improvement
>          Components: gRPC, snapshot
>    Affects Versions: 2.4.1
>            Reporter: Song Ziyang
>            Priority: Critical
>
> h2. 1. Issues related to gRPC logAppender
> 1. *(100% reproduce)* gRPC appender will timeout and fail when installing a 
> large snapshot to follower, as previously reported in 
> https://issues.apache.org/jira/browse/RATIS-1782.
> 2. *(small probability)* Storms of inconsistent RPCs bouncing between leader 
> and followers, as previously reported in 
> https://issues.apache.org/jira/browse/RATIS-1674.
> h2. 2. Cause of these issues
> Current *+deadline+* configuration of gRPC bidirectional streaming leads to 
> the issues above.
> h2. 3. Dive into gRPC logAppender
> gRPC logAppender will generate the stub with a deadline at the beginning of 
> installSnapshot, as in 
> [[1]|https://github.com/apache/ratis/blob/655db36c68a4c46a59150b548e76f2e92c33bf84/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java#L597-L600]
>  and 
> [[2]|https://github.com/apache/ratis/blob/master/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolClient.java#L140-L141].
> {code:java}
> snapshotRequestObserver = getClient().installSnapshot(responseHandler);
> for (InstallSnapshotRequestProto request : 
> newInstallSnapshotRequests(requestId, snapshot)) {
>     snapshotRequestObserver.onNext(request);
> }
> {code}
> Notice that the deadline is set for the whole observer, not for each 
> streaming message. Deadline is a fixed time point in future, every streaming 
> messages should complete before this time point, otherwise will be cancelled 
> and onError will be invoked. 
> I guess the original implementor of installSnapshot treats +deadline+ the 
> same as {+}timeout{+}, which they are not (check 
> [https://grpc.io/blog/deadlines/] for their difference). Therefore, every 
> streaming messages will not have a independent timeout of 3s (which we want), 
> but rather share the same deadline of (initial_time + 3s). When snapshot is 
> large, the RPCs ordered lately will become timeout and fail. This is the 
> cause for 1st issue I mentioned above. Also, gRPC implementors does not 
> recommend to use deadline in a streaming stub (see 
> [https://github.com/grpc/grpc-java/issues/5498#issuecomment-476299936])
> AppendEntries is not affected by this deadline problem since it does not 
> assign a deadline to stub 
> [[3]|https://github.com/apache/ratis/blob/655db36c68a4c46a59150b548e76f2e92c33bf84/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcServerProtocolClient.java#L134].
>  However, not using a deadline also causes some unexpected behaviors, as 
> mentioned in 2nd issue. Every RPC submitted to appendEntries observer will 
> never timeout and will +guaranteed to be delivered+ to the follower. There 
> are max 128 pending requests in gRPC's sending queue. Consider, if the first 
> pending-RPC receives an inconsistent reply, we shall cancel every other RPCs 
> in this sending queue. However, these submitted RPCs are un-cancellable 
> without a deadline, all we can do is to see them being sent, helplessly. 
> These sent requests can cause inconsistency storms, refer to RATIS-1674.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to