Shaofeng SHI created KYLIN-2165: ----------------------------------- Summary: Use hive table statistics data to get the total count Key: KYLIN-2165 URL: https://issues.apache.org/jira/browse/KYLIN-2165 Project: Kylin Issue Type: Improvement Components: Job Engine Reporter: Shaofeng SHI Assignee: Shaofeng SHI Fix For: v1.6.0
Kylin will count on the intermediate flat hive table to get the total row number, then to redistribute that. >From hive's wiki, hive will automatically collect the table statistics when >run a "insert overwrite" statement, then the subsequent "select count(*)" will >be very fast. While, Kylin is executing "INSERT OVERWRITE DIRECTORY >'/kylin/row_count' SELECT count(*) from", which still cause MR/Tez job be >started, this will cause the step take longer time. Just change the SQL to "select count(*)" or using Hive API to get the statistic, the cost will be saved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)