[jira] [Comment Edited] (RATIS-592) One node ratis writes fail forever after first NotLeaderException or LeaderNotReadyException

Siddharth Wagle (JIRA) Fri, 21 Jun 2019 09:52:39 -0700


    [ 
https://issues.apache.org/jira/browse/RATIS-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869678#comment-16869678
 ]


Siddharth Wagle edited comment on RATIS-592 at 6/21/19 4:51 PM:
----------------------------------------------------------------

[~ljain] 1. sounds pretty serious, yesterday's discussion with [~szetszwo], the 
alternative was to throw LeaderNotReady for 1 node RATIS but he did not seem 
convinced that makes total sense, hence this was the alternative. Any other 
better suggestions? What about using something like NoReplyException or 
something like that for stream with null reply? AlreadyClosed seems wrong cause 
the stream isn't closed, right?


was (Author: swagle):
[~ljain] 1. sounds pretty serious, yesterday's discussion with [~szetszwo], the 
alternative was to throw LeaderNotReady for 1 node RATIS but he did not seem 
convinced that makes total sense, hence this was the alternative. Any other 
better suggestions? What about not using something like NoReplyException or 
something like that for stream with null reply? AlreadyClosed seems wrong cause 
the stream isn't closed, right?

> One node ratis writes fail forever after first NotLeaderException or 
> LeaderNotReadyException
> --------------------------------------------------------------------------------------------
>
>                 Key: RATIS-592
>                 URL: https://issues.apache.org/jira/browse/RATIS-592
>             Project: Ratis
>          Issue Type: Bug
>          Components: gRPC
>    Affects Versions: 0.3.0
>            Reporter: Siddharth Wagle
>            Assignee: Siddharth Wagle
>            Priority: Critical
>             Fix For: 0.4.0
>
>         Attachments: RATIS-592.01.patch, RATIS-592.02.patch, 
> RATIS-592.03.patch
>
>
> RATIS-571, modified the GrpcClientProtocolClient to not set the 
> AsyncStreamObserver reference to null on an exception, however, the ReplyMap 
> reference is set to null. This results in the client getting an 
> AlredyClosedException on the stream on a retry for a NotLeader or a 
> LeadrNotReady exception and never recovers. This is common in a unit test 
> scenario where a request is sent immediately after the cluster is up.
> There is nothing special here about one node Ratis however, the HDDS unit 
> tests that fail are all one node Ratis and the most probable cause is that 
> with client retrying a different node each time, increases the chance of 
> success on a three-node ring.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (RATIS-592) One node ratis writes fail forever after first NotLeaderException or LeaderNotReadyException

Reply via email to