[ 
https://issues.apache.org/jira/browse/RATIS-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated RATIS-2524:
-------------------------------
    Description: 
Currently each read will trigger a ReadIndex call. If network overhead is high, 
this can be the bottleneck.

One improvement is to batch reads together to a single ReadIndex call.

Rule: A ReadIndex result may only serve reads whose invocation happened before 
the ReadIndex request is logically issued.


{code:java}
t1: read A arrives at follower
t2: read B arrives at follower
t3: follower sends one ReadIndex request for batch [A, B]
t4: leader processes ReadIndex and returns index I
t5: follower applies >= I
t6: A and B query local state and complete 
{code}

It's not

{code:java}
t1: read A arrives
t2: follower sends ReadIndex request
t3: leader processes it
t4: read B arrives
t5: follower attaches B to A's ReadIndex result 
{code}

This can be implemented using batching window with small batching interval 
(e.g. 500 microseconds or less depending on the average latency). We will batch 
the reads during the batching interval into one ReadIndex batch. After the 
batching interval is done, we will seal this ReadIndex abtch (i.e. no more 
reads will be added into this read) and then we will send a ReadIndex that 
covers all the reads under the sealed window (e.g. if the window has 5 read 
requests then 1 ReadIndex will amortize the cost of ReadIndex). New reads will 
go to the next ReadIndex batch.

This idea is similar to the paper 
https://www.vldb.org/pvldb/vol18/p2831-giortamis.pdf (https://law-theorem.com/) 
mentioned in RATIS-2403 where the "sync" lightweight write operation is 
replaced with ReadIndex (which is also a form of "sync").

Therefore while RATIS-2403 batch writes together into a single RepliedIndex to 
reduce the bottleneck introduced by high ReadIndex increase (and which causes 
longer follower waitForAdvance). This patch focuses on amortizing the network 
latency bottleneck for reads.

  was:
Currently each read will trigger a ReadIndex call. If network overhead is high, 
this can be the bottleneck.

One improvement is to batch reads together to a single ReadIndex call.

Rule: A ReadIndex result may only serve reads whose invocation happened before 
the ReadIndex request is logically issued.


{code:java}
t1: read A arrives at follower
t2: read B arrives at follower
t3: follower sends one ReadIndex request for batch [A, B]
t4: leader processes ReadIndex and returns index I
t5: follower applies >= I
t6: A and B query local state and complete 
{code}

It's not

{code:java}
t1: read A arrives
t2: follower sends ReadIndex request
t3: leader processes it
t4: read B arrives
t5: follower attaches B to A's ReadIndex result 
{code}

This can be implemented using batching window with small batching interval 
(e.g. 500 microseconds or less depending on the average latency). We will batch 
the reads during the batching interval into one ReadIndex batch. After the 
batching interval is done, we will seal this ReadIndex abtch (i.e. no more 
reads will be added into this read) and then we will send a ReadIndex that 
covers all the reads under the sealed window (e.g. if the window has 5 read 
requests then 1 ReadIndex will amortize the cost of ReadIndex). New reads will 
go to the next ReadIndex batch.

This idea is similar to the paper 
https://www.vldb.org/pvldb/vol18/p2831-giortamis.pdf (https://law-theorem.com/) 
mentioned in RATIS-2403 where the "sync" lightweight write operation is 
replaced with ReadIndex (which is also a form of "sync").

Therefore while RATIS-2403 batch writes together into a single RepliedIndex to 
reduce the bottleneck introduced by high ReadIndex increase (and longer 
follower waitForAdvance). This patch focuses amortizing the network latency 
bottleneck for reads.


> Implement ReadIndex coalescing
> ------------------------------
>
>                 Key: RATIS-2524
>                 URL: https://issues.apache.org/jira/browse/RATIS-2524
>             Project: Ratis
>          Issue Type: Improvement
>          Components: Linearizable Read, server
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> Currently each read will trigger a ReadIndex call. If network overhead is 
> high, this can be the bottleneck.
> One improvement is to batch reads together to a single ReadIndex call.
> Rule: A ReadIndex result may only serve reads whose invocation happened 
> before the ReadIndex request is logically issued.
> {code:java}
> t1: read A arrives at follower
> t2: read B arrives at follower
> t3: follower sends one ReadIndex request for batch [A, B]
> t4: leader processes ReadIndex and returns index I
> t5: follower applies >= I
> t6: A and B query local state and complete 
> {code}
> It's not
> {code:java}
> t1: read A arrives
> t2: follower sends ReadIndex request
> t3: leader processes it
> t4: read B arrives
> t5: follower attaches B to A's ReadIndex result 
> {code}
> This can be implemented using batching window with small batching interval 
> (e.g. 500 microseconds or less depending on the average latency). We will 
> batch the reads during the batching interval into one ReadIndex batch. After 
> the batching interval is done, we will seal this ReadIndex abtch (i.e. no 
> more reads will be added into this read) and then we will send a ReadIndex 
> that covers all the reads under the sealed window (e.g. if the window has 5 
> read requests then 1 ReadIndex will amortize the cost of ReadIndex). New 
> reads will go to the next ReadIndex batch.
> This idea is similar to the paper 
> https://www.vldb.org/pvldb/vol18/p2831-giortamis.pdf 
> (https://law-theorem.com/) mentioned in RATIS-2403 where the "sync" 
> lightweight write operation is replaced with ReadIndex (which is also a form 
> of "sync").
> Therefore while RATIS-2403 batch writes together into a single RepliedIndex 
> to reduce the bottleneck introduced by high ReadIndex increase (and which 
> causes longer follower waitForAdvance). This patch focuses on amortizing the 
> network latency bottleneck for reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to