Shaofeng SHI created KYLIN-2165:
-----------------------------------

             Summary: Use hive table statistics data to get the total count
                 Key: KYLIN-2165
                 URL: https://issues.apache.org/jira/browse/KYLIN-2165
             Project: Kylin
          Issue Type: Improvement
          Components: Job Engine
            Reporter: Shaofeng SHI
            Assignee: Shaofeng SHI
             Fix For: v1.6.0


Kylin will count on the intermediate flat hive table to get the total row 
number, then to redistribute that.

>From hive's wiki, hive will automatically collect the table statistics when 
>run a "insert overwrite" statement, then the subsequent "select count(*)" will 
>be very fast. While, Kylin is executing "INSERT OVERWRITE DIRECTORY 
>'/kylin/row_count' SELECT count(*) from", which still cause MR/Tez job be 
>started, this will cause the step take longer time.

Just change the SQL to "select count(*)" or using Hive API to get the 
statistic, the cost will be saved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to