[
https://issues.apache.org/jira/browse/KYLIN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shao Feng Shi updated KYLIN-4315:
---------------------------------
Fix Version/s: v3.1.0
[~xiacongling] could you please check the review comments in the JIRA? Thx!
> Use metadata numRows in beeline client for quick row counting
> -------------------------------------------------------------
>
> Key: KYLIN-4315
> URL: https://issues.apache.org/jira/browse/KYLIN-4315
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Reporter: Congling Xia
> Assignee: Congling Xia
> Priority: Major
> Fix For: v3.1.0
>
>
> Hi, I find that in `BeelineHiveClient`, method `getHiveTableRows` uses
> "select count(*) from <tb_name>" for table row counting. The method is
> invoked in flat intermediate table redistribution step in cube building.
> This stats can be loaded in metastore. It costs much less time than scanning
> all rows in Hive table. Since intermediate tables are created and inserted by
> Kylin, statistics will be automatically calculated and stored in metastore
> when
> `[hive.stats.autogather|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.autogather]`
> is enabled (which is the default setting for Hive).
> ref Hive wiki for more detail about `numRows` stats:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables%E2%80%93ANALYZE]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)