[
https://issues.apache.org/jira/browse/KUDU-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954331#comment-15954331
]
Todd Lipcon commented on KUDU-1454:
-----------------------------------
I don't think we need fault-tolerant scans -- that's orthogonal.
I do think, though, that we need to be setting a propagated timestamp in all of
the scanners, or else when we read from a non-leader we're liable to lose
read-your-writes, which has confusing semantics.
> Spark and MR jobs running without scan locality
> -----------------------------------------------
>
> Key: KUDU-1454
> URL: https://issues.apache.org/jira/browse/KUDU-1454
> Project: Kudu
> Issue Type: Bug
> Components: client, perf, spark
> Affects Versions: 0.8.0
> Reporter: Todd Lipcon
> Priority: Critical
>
> Spark (and according to [~danburkert] MR also now) add all of the locations
> of a tablet as split locations. This makes sense except that the Java client
> currently always scans the leader replica. So in many cases we schedule a
> task which is "local" to a follower, and then it ends up having to do a
> remote scan.
> This makes Spark queries take about twice as long on tables with replicas
> compared to unreplicated tables, and I think is a regression on the MR side.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)