[ 
https://issues.apache.org/jira/browse/HIVE-11265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11265:
------------------------------------
    Description: 
Running q27 with split-waves 0.9 on 10 nodes x 16 executors, I get 140 mappers 
reading store_sales, and 5~ more assorted vertices.
When running the query repeatedly, one would expect good locality, i.e. the 
same splits (files+stripes) being processed on the same nodes most of the time.
However, this is only the case for 40-50% of the stripes in my experience. When 
the query is run 10 times in a row, an average split (file+stripe) is read on 
~4 machine. Some are actually read on a different machine every run :)

This affects cache hit ratio.
Understandably in real scenarios we won't get 100% locality, but we should not 
be getting bad locality in simple cases like this.

  was:
Running q27 with split-waves 0.9 on 10 nodes x 16 executors, I get 140 mappers 
reading store_sales, and 5~ more assorted vertices.
When running the query repeatedly, one would expect good locality, i.e. the 
same stripes being processed on the same nodes most of the time.
However, this is only the case for 40-50% of the stripes in my experience. When 
the query is run 10 times in a row, an average split (file+stripe) is read on 
~4 machine. Some are actually read on a different machine every run :)

This affects cache hit ratio.
Understandably in real scenarios we won't get 100% locality, but we should not 
be getting bad locality in simple cases like this.


> LLAP: investigate locality issues
> ---------------------------------
>
>                 Key: HIVE-11265
>                 URL: https://issues.apache.org/jira/browse/HIVE-11265
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Siddharth Seth
>
> Running q27 with split-waves 0.9 on 10 nodes x 16 executors, I get 140 
> mappers reading store_sales, and 5~ more assorted vertices.
> When running the query repeatedly, one would expect good locality, i.e. the 
> same splits (files+stripes) being processed on the same nodes most of the 
> time.
> However, this is only the case for 40-50% of the stripes in my experience. 
> When the query is run 10 times in a row, an average split (file+stripe) is 
> read on ~4 machine. Some are actually read on a different machine every run :)
> This affects cache hit ratio.
> Understandably in real scenarios we won't get 100% locality, but we should 
> not be getting bad locality in simple cases like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to