[
https://issues.apache.org/jira/browse/KUDU-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Alves updated KUDU-1703:
------------------------------
Description:
When we fix safe time advancement, replicas will start to block on snapshot
scans for arbitrary amounts of time, waiting to have a consistent view of the
world at that timestamp before serving the scan. This will be a serious problem
for lagging replicas, which might be several seconds or even minutes behind.
Moreover in the absence of writes, the same will happen even for non-lagging
replicas, which will have their safe times updated only when the leader
heartbeats.
We need to at least make sure that:
- Blocked scanner threads are not starving other work.
- If the replica's safe time is lagging by a lot, the replica refuses to do the
scan.
We might also consider other optimizations (like pinging the leader).
was:
When we fix safe time advancement, replicas will start to block on snapshot
scans for arbitrary amounts of time, waiting to have a consistent view of the
world at that timestamp before serving the scan.
This will be a serious problem for lagging replicas, which might be several
seconds or even minutes behind. Moreover in the absence of writes, the same
will happen even for non-lagging replicas, which will have their safe times
updated only when the leader heartbeats.
We need to at least make sure that:
- Blocked scanner threads are not starving other work.
- If the replica's safe time is lagging by a lot, the replica refuses to do the
scan.
We might also consider other optimizations (like pinging the leader).
Summary: Handle snapshot reads that might block indefinitely (was:
Handle lagging replicas for snapshot reads)
> Handle snapshot reads that might block indefinitely
> ---------------------------------------------------
>
> Key: KUDU-1703
> URL: https://issues.apache.org/jira/browse/KUDU-1703
> Project: Kudu
> Issue Type: Sub-task
> Affects Versions: 1.1.0
> Reporter: David Alves
> Assignee: David Alves
>
> When we fix safe time advancement, replicas will start to block on snapshot
> scans for arbitrary amounts of time, waiting to have a consistent view of the
> world at that timestamp before serving the scan. This will be a serious
> problem for lagging replicas, which might be several seconds or even minutes
> behind.
> Moreover in the absence of writes, the same will happen even for non-lagging
> replicas, which will have their safe times updated only when the leader
> heartbeats.
> We need to at least make sure that:
> - Blocked scanner threads are not starving other work.
> - If the replica's safe time is lagging by a lot, the replica refuses to do
> the scan.
> We might also consider other optimizations (like pinging the leader).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)