[ 
https://issues.apache.org/jira/browse/KUDU-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Alves updated KUDU-1188:
------------------------------
    Component/s: consensus

> For snapshot read correctness, enforce simple form of leader leases
> -------------------------------------------------------------------
>
>                 Key: KUDU-1188
>                 URL: https://issues.apache.org/jira/browse/KUDU-1188
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: consensus, tserver
>    Affects Versions: Public beta
>            Reporter: David Alves
>            Assignee: David Alves
>
> Since raft doesn't allow holes in the log, a new leader is guaranteed to have 
> all the writes that preceded its election and to have them in flight when 
> elected (meaning mvcc will have those transactions in flight, meaning a 
> snapshot read will wait for them to complete). So, for writes, leases aren't 
> really necessary. This is contrary to paxos in spanner where there is no 
> timestamp propagation and the log might have holes and leases are required to 
> enforce write correctness.
> However some form of lease is necessary to enforce read consistency. In 
> particular in the following case:
> Leader A, accepts a write at time 10 which commits and has no following 
> writes, it then serves a snapshot read at 15, and crashed.
> Leader B is elected but has a slow clock which reads 11 when he's ready to 
> serve writes. It then accepts a write at time 13.
> The snapshot read at 15 is now broken.
> A simple form to avoid this is to have each replica promise, on each ack, 
> that if ever elected leader it won't accept writes or serve snapshot read 
> until a certain period, say 2 secs has passed since that ack. On the leader 
> side, the leader is only allowed to serve snapshot read up to 2 seconds since 
> _a majority_ of replicas has ack'd. which in practice means 1 replica usually.
> With such a mechanism in place, if the lease is 5, then leader B wouldn't 
> accept the write at time 13 and would instead wait until 15 had passed, not 
> breaking the snapshot read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to