Shaofeng SHI created KYLIN-2165:
-----------------------------------
Summary: Use hive table statistics data to get the total count
Key: KYLIN-2165
URL: https://issues.apache.org/jira/browse/KYLIN-2165
Project: Kylin
Issue Type: Improvement
Components: Job Engine
Reporter: Shaofeng SHI
Assignee: Shaofeng SHI
Fix For: v1.6.0
Kylin will count on the intermediate flat hive table to get the total row
number, then to redistribute that.
>From hive's wiki, hive will automatically collect the table statistics when
>run a "insert overwrite" statement, then the subsequent "select count(*)" will
>be very fast. While, Kylin is executing "INSERT OVERWRITE DIRECTORY
>'/kylin/row_count' SELECT count(*) from", which still cause MR/Tez job be
>started, this will cause the step take longer time.
Just change the SQL to "select count(*)" or using Hive API to get the
statistic, the cost will be saved.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)