[
https://issues.apache.org/jira/browse/KYLIN-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114469#comment-17114469
]
ASF subversion and git services commented on KYLIN-4315:
--------------------------------------------------------
Commit 24f0063daacb6732aa06f5abbe6d198f570ecf95 in kylin's branch
refs/heads/master from xiacongling
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=24f0063 ]
KYLIN-4315 use metadata numRows in beeline client for quick row counting
> Use metadata numRows in beeline client for quick row counting
> -------------------------------------------------------------
>
> Key: KYLIN-4315
> URL: https://issues.apache.org/jira/browse/KYLIN-4315
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Reporter: Congling Xia
> Assignee: Congling Xia
> Priority: Major
> Fix For: v3.1.0
>
>
> Hi, I find that in `BeelineHiveClient`, method `getHiveTableRows` uses
> "select count(*) from <tb_name>" for table row counting. The method is
> invoked in flat intermediate table redistribution step in cube building.
> This stats can be loaded in metastore. It costs much less time than scanning
> all rows in Hive table. Since intermediate tables are created and inserted by
> Kylin, statistics will be automatically calculated and stored in metastore
> when
> `[hive.stats.autogather|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.autogather]`
> is enabled (which is the default setting for Hive).
> ref Hive wiki for more detail about `numRows` stats:
> [https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables%E2%80%93ANALYZE]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)