[jira] [Commented] (HIVE-29203) get_aggr_stats_for doesn't aggregate stats when direct sql batch retrieve is enabled

Zhihua Deng (Jira) Tue, 16 Sep 2025 02:55:25 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-29203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18020604#comment-18020604
 ]


Zhihua Deng commented on HIVE-29203:
------------------------------------

If we remove the batch processing from 
[https://github.com/apache/hive/blob/4bb08099d91acbefee73a449a36abb1ecd2b5925/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1882-L1891,]
 perhaps we need to consider the memory usage when enableBitVector || 
enableKll, and the 
restrictions on IN list size in aggrStatsUseDB.

> get_aggr_stats_for doesn't aggregate stats when direct sql batch retrieve is 
> enabled
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-29203
>                 URL: https://issues.apache.org/jira/browse/HIVE-29203
>             Project: Hive
>          Issue Type: Bug
>          Components: Standalone Metastore
>            Reporter: Zhihua Deng
>            Priority: Major
>
> In case of metastore.direct.sql.batch.size > 0, and number of partition names 
> or columns in get_aggr_stats_for is bigger than the 
> metastore.direct.sql.batch.size, then the
> AggrStats from the call get_aggr_stats_for might have un-merged stats for the 
> same column, so the aggregated stats is not correct, which may make CBO 
> generate an outdated execution plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-29203) get_aggr_stats_for doesn't aggregate stats when direct sql batch retrieve is enabled

Reply via email to