David Alves created KUDU-1703:
Summary: Handle lagging replicas for snapshot reads
Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: David Alves
When we fix safe time advancement, replicas will start to block on snapshot
scans for arbitrary amounts of time, waiting to have a consistent view of the
world at that timestamp before serving the scan.
This will be a serious problem for lagging replicas, which might be several
seconds or even minutes behind. Moreover in the absence of writes, the same
will happen even for non-lagging replicas, which will have their safe times
updated only when the leader heartbeats.
We need to at least make sure that:
- Blocked scanner threads are not starving other work.
- If the replica's safe time is lagging by a lot, the replica refuses to do the
We might also consider other optimizations (like pinging the leader).
This message was sent by Atlassian JIRA