Sergey Shelukhin created HIVE-11265:
---------------------------------------

             Summary: LLAP: investigate locality issues
                 Key: HIVE-11265
                 URL: https://issues.apache.org/jira/browse/HIVE-11265
             Project: Hive
          Issue Type: Sub-task
            Reporter: Sergey Shelukhin
            Assignee: Siddharth Seth


Running q27 with split-waves 0.9 on 10 nodes x 16 executors, I get 140 mappers 
reading store_sales, and 5~ more assorted vertices.
When running the query repeatedly, one would expect good locality, i.e. the 
same stripes being processed on the same nodes most of the time.
However, this is only the case for 40-50% of the stripes in my experience. When 
the query is run 10 times in a row, an average split (file+stripe) is read on 
~4 machine. Some are actually read on a different machine every run :)

This affects cache hit ratio.
Understandably in real scenarios we won't get 100% locality, but we should not 
be getting bad locality in simple cases like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to