[jira] [Commented] (RATIS-592) One node ratis writes fail forever after first NotLeaderException or LeaderNotReadyException

Tsz Wo Nicholas Sze (JIRA) Tue, 25 Jun 2019 15:31:07 -0700


    [ 
https://issues.apache.org/jira/browse/RATIS-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872776#comment-16872776
 ]


Tsz Wo Nicholas Sze commented on RATIS-592:
-------------------------------------------

BTW, would the shouldReconnect changes described in [this 
comment|https://issues.apache.org/jira/browse/RATIS-592?focusedCommentId=16870385&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16870385]
 enough to fix the single node problem since it will reconnect for 
AlreadyClosedException?

> One node ratis writes fail forever after first NotLeaderException or 
> LeaderNotReadyException
> --------------------------------------------------------------------------------------------
>
>                 Key: RATIS-592
>                 URL: https://issues.apache.org/jira/browse/RATIS-592
>             Project: Ratis
>          Issue Type: Bug
>          Components: gRPC
>    Affects Versions: 0.3.0
>            Reporter: Siddharth Wagle
>            Assignee: Siddharth Wagle
>            Priority: Critical
>             Fix For: 0.4.0
>
>         Attachments: RATIS-592.01.patch, RATIS-592.02.patch, 
> RATIS-592.03.patch, RATIS-592.04.patch, RATIS-592.05.patch, 
> RATIS-592.06.patch, RATIS-592.07.patch, RATIS-592.08.patch
>
>
> RATIS-571, modified the GrpcClientProtocolClient to not set the 
> AsyncStreamObserver reference to null on an exception, however, the ReplyMap 
> reference is set to null. This results in the client getting an 
> AlredyClosedException on the stream on a retry for a NotLeader or a 
> LeadrNotReady exception and never recovers. This is common in a unit test 
> scenario where a request is sent immediately after the cluster is up.
> There is nothing special here about one node Ratis however, the HDDS unit 
> tests that fail are all one node Ratis and the most probable cause is that 
> with client retrying a different node each time, increases the chance of 
> success on a three-node ring.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (RATIS-592) One node ratis writes fail forever after first NotLeaderException or LeaderNotReadyException

Reply via email to