[ 
https://issues.apache.org/jira/browse/KUDU-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15318642#comment-15318642
 ] 

Todd Lipcon commented on KUDU-1454:
-----------------------------------

[~armoredMojo] [[email protected]] anyone looking into this? Seems like it's 
important to address (really hurts Spark performance)

> Spark and MR jobs running without scan locality
> -----------------------------------------------
>
>                 Key: KUDU-1454
>                 URL: https://issues.apache.org/jira/browse/KUDU-1454
>             Project: Kudu
>          Issue Type: Bug
>          Components: client, perf, spark
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> Spark (and according to [~danburkert] MR also now) add all of the locations 
> of a tablet as split locations. This makes sense except that the Java client 
> currently always scans the leader replica. So in many cases we schedule a 
> task which is "local" to a follower, and then it ends up having to do a 
> remote scan.
> This makes Spark queries take about twice as long on tables with replicas 
> compared to unreplicated tables, and I think is a regression on the MR side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to