[
https://issues.apache.org/jira/browse/RATIS-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated RATIS-2524:
-------------------------------
Description:
Currently each read will trigger a ReadIndex call. If network overhead is high,
this can be the bottleneck.
One improvement is to batch reads together to a single ReadIndex call.
Rule: A ReadIndex result may only serve reads whose invocation happened before
the ReadIndex request is logically issued.
{code:java}
t1: read A arrives at follower
t2: read B arrives at follower
t3: follower sends one ReadIndex request for batch [A, B]
t4: leader processes ReadIndex and returns index I
t5: follower applies >= I
t6: A and B query local state and complete
{code}
It's not
{code:java}
t1: read A arrives
t2: follower sends ReadIndex request
t3: leader processes it
t4: read B arrives
t5: follower attaches B to A's ReadIndex result
{code}
This can be implemented using batching window with small batching interval
(e.g. 500 microseconds or less depending on the average latency). We will batch
the reads during the batching interval into one ReadIndex batch. After the
batching interval is done, we will seal this ReadIndex abtch (i.e. no more
reads will be added into this read) and then we will send a ReadIndex that
covers all the reads under the sealed window (e.g. if the window has 5 read
requests then 1 ReadIndex will amortize the cost of ReadIndex). New reads will
go to the next ReadIndex batch.
This idea is similar to the paper
https://www.vldb.org/pvldb/vol18/p2831-giortamis.pdf (https://law-theorem.com/)
where the "sync" lightweight write operation is replaced with ReadIndex (which
is also a form of "sync").
Therefore while RATIS-2403 batch writes together into a single RepliedIndex to
reduce the bottleneck introduced by high ReadIndex increase (and longer
follower waitForAdvance). This patch focuses amortizing the network latency
bottleneck for reads.
was:
Currently each read will trigger a ReadIndex call. If network overhead is high,
this can be the bottleneck.
One improvement is to batch reads together to a single ReadIndex call.
Rule: A ReadIndex result may only serve reads whose invocation happened before
the ReadIndex request is logically issued.
{code:java}
t1: read A arrives at follower
t2: read B arrives at follower
t3: follower sends one ReadIndex request for batch [A, B]
t4: leader processes ReadIndex and returns index I
t5: follower applies >= I
t6: A and B query local state and complete
{code}
It's not
{code:java}
t1: read A arrives
t2: follower sends ReadIndex request
t3: leader processes it
t4: read B arrives
t5: follower attaches B to A's ReadIndex result
{code}
This can be implemented using batching window with small batching interval
(e.g. 500 microseconds or less depending on the average latency). We will batch
the reads during the batching interval into one window. After the batching
interval is done, we will seal this window (i.e. no more reads will be added
into this read) and then we will send a ReadIndex that covers all the reads
under the sealed window (e.g. if the window has 5 read requests then 1
ReadIndex will amortize the cost of ReadIndex).
This idea is similar to the paper
https://www.vldb.org/pvldb/vol18/p2831-giortamis.pdf (https://law-theorem.com/)
where the "sync" lightweight write operation is replaced with ReadIndex (which
is also a form of "sync").
Therefore while RATIS-2403 batch writes together into a single RepliedIndex to
reduce the bottleneck introduced by high ReadIndex increase (and longer
follower waitForAdvance). This patch focuses amortizing the network latency
bottleneck for reads.
> Implement ReadIndex coalescing
> ------------------------------
>
> Key: RATIS-2524
> URL: https://issues.apache.org/jira/browse/RATIS-2524
> Project: Ratis
> Issue Type: Improvement
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> Currently each read will trigger a ReadIndex call. If network overhead is
> high, this can be the bottleneck.
> One improvement is to batch reads together to a single ReadIndex call.
> Rule: A ReadIndex result may only serve reads whose invocation happened
> before the ReadIndex request is logically issued.
> {code:java}
> t1: read A arrives at follower
> t2: read B arrives at follower
> t3: follower sends one ReadIndex request for batch [A, B]
> t4: leader processes ReadIndex and returns index I
> t5: follower applies >= I
> t6: A and B query local state and complete
> {code}
> It's not
> {code:java}
> t1: read A arrives
> t2: follower sends ReadIndex request
> t3: leader processes it
> t4: read B arrives
> t5: follower attaches B to A's ReadIndex result
> {code}
> This can be implemented using batching window with small batching interval
> (e.g. 500 microseconds or less depending on the average latency). We will
> batch the reads during the batching interval into one ReadIndex batch. After
> the batching interval is done, we will seal this ReadIndex abtch (i.e. no
> more reads will be added into this read) and then we will send a ReadIndex
> that covers all the reads under the sealed window (e.g. if the window has 5
> read requests then 1 ReadIndex will amortize the cost of ReadIndex). New
> reads will go to the next ReadIndex batch.
> This idea is similar to the paper
> https://www.vldb.org/pvldb/vol18/p2831-giortamis.pdf
> (https://law-theorem.com/) where the "sync" lightweight write operation is
> replaced with ReadIndex (which is also a form of "sync").
> Therefore while RATIS-2403 batch writes together into a single RepliedIndex
> to reduce the bottleneck introduced by high ReadIndex increase (and longer
> follower waitForAdvance). This patch focuses amortizing the network latency
> bottleneck for reads.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)