Todd Lipcon created KUDU-1454:
---------------------------------

             Summary: Spark and MR jobs running without scan locality
                 Key: KUDU-1454
                 URL: https://issues.apache.org/jira/browse/KUDU-1454
             Project: Kudu
          Issue Type: Bug
          Components: client, perf, spark
    Affects Versions: 0.8.0
            Reporter: Todd Lipcon
            Priority: Critical


Spark (and according to [~danburkert] MR also now) add all of the locations of 
a tablet as split locations. This makes sense except that the Java client 
currently always scans the leader replica. So in many cases we schedule a task 
which is "local" to a follower, and then it ends up having to do a remote scan.

This makes Spark queries take about twice as long on tables with replicas 
compared to unreplicated tables, and I think is a regression on the MR side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to