[
https://issues.apache.org/jira/browse/HIVE-20246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alice Fan updated HIVE-20246:
-----------------------------
Description:
By default, Hive collects stats when running operations like alter table
partition(s), create table, and create external table. However, collecting
stats requires Metastore lists all files under the table directory and the file
listing operation can be very expensive particularly on filesystems like S3.
HIVE-18743 made DO_NOT_UPDATE_STATS table property could be selectively prevent
stats collection.
This Jira aims at introducing DO_NOT_UPDATE_STATS table property into the
MetaStoreUtils.updatePartitionStatsFast. By adding this, user can be
selectively prevent stats collection when doing alter table partition(s)
operation at table level. For example, set 'Alter Table S3_Table set
tblproperties('DO_NOT_UPDATE_STATS'='TRUE');' MetaStore will not collect stats
for the specified S3_Table when alter table add partition(key1=val1, key2=val2);
was:
By default, Hive collects stats when running operations like alter partitioned
table, alter unpartitioned_table, create table, and create external table.
However, collecting stats requires Metastore lists all files under the table
directory and the file listing operation can be very expensive particularly on
filesystems like S3.
This Jira aims at introducing DO_NOT_UPDATE_STATS into the above operations to
provide user a configurable option to stop collecting stats at table level. For
example, by 'Alter Table S3_Table set
tblproperties('DO_NOT_UPDATE_STATS'='TRUE');' MetaStore should stop collecting
stats for the specified S3_Table.
> Configurable collecting stats by using DO_NOT_UPDATE_STATS table property
> -------------------------------------------------------------------------
>
> Key: HIVE-20246
> URL: https://issues.apache.org/jira/browse/HIVE-20246
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Alice Fan
> Assignee: Alice Fan
> Priority: Minor
> Fix For: 4.0.0
>
>
> By default, Hive collects stats when running operations like alter table
> partition(s), create table, and create external table. However, collecting
> stats requires Metastore lists all files under the table directory and the
> file listing operation can be very expensive particularly on filesystems like
> S3.
> HIVE-18743 made DO_NOT_UPDATE_STATS table property could be selectively
> prevent stats collection.
> This Jira aims at introducing DO_NOT_UPDATE_STATS table property into the
> MetaStoreUtils.updatePartitionStatsFast. By adding this, user can be
> selectively prevent stats collection when doing alter table partition(s)
> operation at table level. For example, set 'Alter Table S3_Table set
> tblproperties('DO_NOT_UPDATE_STATS'='TRUE');' MetaStore will not collect
> stats for the specified S3_Table when alter table add partition(key1=val1,
> key2=val2);
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)