[
https://issues.apache.org/jira/browse/RATIS-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048698#comment-18048698
]
Ivan Andika edited comment on RATIS-2350 at 1/2/26 3:23 AM:
------------------------------------------------------------
I'm going on a tangent here, but I was thinking whether we can make ReadIndex
return the appliedIndex instead of the commitIndex (as the Raft paper mentions)
to reduce the latency.
The main argument is that since writes to the leader only returns after the
leader applies the write transactions, we can still see the latest write even
if the ReadIndex returns the applied index.
This reduces the follower wait time for the leader’s ReadIndex (see
ReadRequests#waitForAdvance) since the applied index should be less or equal to
the commit index. Under stable condition (i.e. no network partitions,
unexpected elections), I think this might be a valid strategy.
There might be subtle issues and possible edge cases that might violate the
consistency guarantee, I'm trying to find one. However, I think if we can use a
more formal approach (e.g. TLA+), the issue might be more obvious.
Also I found other possible ReadIndex improvements
[https://github.com/drmingdrmer/consensus-essence/blob/main/src/list/raft-read-index/raft-read-index.md]
(I think this is already implemented in Ratis)
https://github.com/drmingdrmer/consensus-essence/blob/main/src/list/raft-read-index-relaxed-order/raft-read-index-relaxed-order.md
was (Author: JIRAUSER298977):
I'm going on a tangent here, but I was thinking whether we can make ReadIndex
return the appliedIndex instead of the commitIndex (as the Raft paper mentions)
to reduce the latency.
The main argument is that since writes to the leader only returns after the
leader applies the write transactions, we can still see the latest write even
if the ReadIndex returns the applied index.
This reduces the follower wait time for the leader’s ReadIndex (see
ReadRequests#waitForAdvance) since the applied index should be less or equal to
the commit index. Under stable condition (i.e. no network partitions,
unexpected elections), I think this might be a valid strategy.
There might be subtle issues and possible edge cases that might violate the
consistency guarantee, I'm trying to find one.
Also I found other possible ReadIndex improvements
[https://github.com/drmingdrmer/consensus-essence/blob/main/src/list/raft-read-index/raft-read-index.md]
(I think this is already implemented in Ratis)
https://github.com/drmingdrmer/consensus-essence/blob/main/src/list/raft-read-index-relaxed-order/raft-read-index-relaxed-order.md
> Fix readAfterWrite bugs
> -----------------------
>
> Key: RATIS-2350
> URL: https://issues.apache.org/jira/browse/RATIS-2350
> Project: Ratis
> Issue Type: Bug
> Components: server
> Reporter: Tsz-wo Sze
> Assignee: Tsz-wo Sze
> Priority: Major
> Fix For: 3.3.0
>
> Attachments: image-2025-12-31-10-28-28-475.png, screenshot-1.png,
> screenshot-2.png
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> There are bugs in handling readAfterWrite requests:
> # In LeaderStateImpl.getReadIndex(..), it should use the max of
> readAfterWriteConsistentIndex and commitIndex.
> # In WriteIndexCache.add(..), it should combine the current future with
> previous future when the previous future exists.
> Improvement:
> - Add lastAppliedIndex to ReadIndexQueue
> - Replace Consumer<Long> with LongConsumer
> Bug in tests:
> - In LinearizableReadTests.runTestReadAfterWrite(..), it tries to assert the
> following:
> {quote}Assertion: _read-after-write is more consistent than linearizable read_
> {quote}
> Recall the definitions:
> {quote}Read-after-write: _Within the same client, the read called after write
> must able to see the change of the write._
> {quote}
> {quote}Linearizable read: _The read is linearizable (i.e. it won't read stale
> data)._
> {quote}
> Suppose readIndex is 9 and writeIndex is 10. By definition, read-after-write
> must return any state at log index A >= 10 while linearizable read must
> return any state at log index L >= 9. The assertion incorrectly check if A >=
> L, which is not a requirement. It is perfectly fine, for example, if A=11 <
> L=12.
> ----
> Original Summary: TestReadOnlyRequestWithGrpc may fail intermittently
> Original Description:
> {code:java}
> org.apache.ratis.grpc.TestReadOnlyRequestWithGrpc.testReadAfterWrite -- Time
> elapsed: 1.572 s <<< FAILURE!
> org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
> at
> org.apache.ratis.ReadOnlyRequestTests.testReadAfterWriteImpl(ReadOnlyRequestTests.java:314)
> at
> org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:143)
> at
> org.apache.ratis.server.impl.MiniRaftCluster$Factory$Get.runWithNewCluster(MiniRaftCluster.java:121)
> at
> org.apache.ratis.ReadOnlyRequestTests.testReadAfterWrite(ReadOnlyRequestTests.java:289)
> {code}
> It failed 8 in [this 10x10
> run|https://github.com/apache/ratis/actions/runs/19023405871/job/54322726144].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)