[jira] [Created] (KYLIN-2165) Use hive table statistics data to get the total count

Shaofeng SHI (JIRA) Sun, 06 Nov 2016 01:12:07 -0700

Shaofeng SHI created KYLIN-2165:
-----------------------------------

             Summary: Use hive table statistics data to get the total count
                 Key: KYLIN-2165
                 URL: https://issues.apache.org/jira/browse/KYLIN-2165
             Project: Kylin
          Issue Type: Improvement
          Components: Job Engine
            Reporter: Shaofeng SHI
            Assignee: Shaofeng SHI
             Fix For: v1.6.0



Kylin will count on the intermediate flat hive table to get the total row 
number, then to redistribute that.

>From hive's wiki, hive will automatically collect the table statistics when 
>run a "insert overwrite" statement, then the subsequent "select count(*)" will 
>be very fast. While, Kylin is executing "INSERT OVERWRITE DIRECTORY 
>'/kylin/row_count' SELECT count(*) from", which still cause MR/Tez job be 
>started, this will cause the step take longer time.

Just change the SQL to "select count(*)" or using Hive API to get the 
statistic, the cost will be saved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-2165) Use hive table statistics data to get the total count

Reply via email to