[
https://issues.apache.org/jira/browse/RATIS-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872776#comment-16872776
]
Tsz Wo Nicholas Sze commented on RATIS-592:
-------------------------------------------
BTW, would the shouldReconnect changes described in [this
comment|https://issues.apache.org/jira/browse/RATIS-592?focusedCommentId=16870385&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16870385]
enough to fix the single node problem since it will reconnect for
AlreadyClosedException?
> One node ratis writes fail forever after first NotLeaderException or
> LeaderNotReadyException
> --------------------------------------------------------------------------------------------
>
> Key: RATIS-592
> URL: https://issues.apache.org/jira/browse/RATIS-592
> Project: Ratis
> Issue Type: Bug
> Components: gRPC
> Affects Versions: 0.3.0
> Reporter: Siddharth Wagle
> Assignee: Siddharth Wagle
> Priority: Critical
> Fix For: 0.4.0
>
> Attachments: RATIS-592.01.patch, RATIS-592.02.patch,
> RATIS-592.03.patch, RATIS-592.04.patch, RATIS-592.05.patch,
> RATIS-592.06.patch, RATIS-592.07.patch, RATIS-592.08.patch
>
>
> RATIS-571, modified the GrpcClientProtocolClient to not set the
> AsyncStreamObserver reference to null on an exception, however, the ReplyMap
> reference is set to null. This results in the client getting an
> AlredyClosedException on the stream on a retry for a NotLeader or a
> LeadrNotReady exception and never recovers. This is common in a unit test
> scenario where a request is sent immediately after the cluster is up.
> There is nothing special here about one node Ratis however, the HDDS unit
> tests that fail are all one node Ratis and the most probable cause is that
> with client retrying a different node each time, increases the chance of
> success on a three-node ring.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)