[
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531204#comment-14531204
]
Dongwook Kwon commented on HIVE-10631:
--------------------------------------
If the intention of line
1363(MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true)) is
updating fast stats regardless of the fact the table is just created, (which
means no MSCK REPAIR PARTITIONS, even if it's existing external table, I don't
understand the reason why it tries to update stats before metastore know about
partitions, this part, I still don't understand, however if this was the
intention of HIVE-3959),
Then I believe line 1363 should be something like below
{code}
FileStatus[] fileStatus = wh.getFileStatusesForUnpartitionedTable(db, tbl);
MetaStoreUtils.updateUnpartitionedTableStatsFast(tbl, fileStatus,
fileStatus.length == 0, false);
{code}
Otherwise it should be like the this, at least not to scan folders for
unnecessary operation.
{code}
MetaStoreUtils.updateUnpartitionedTableStatsFast(tbl, null, true, false);
{code}
Just my thought.
> create_table_core method has invalid update for Fast Stats
> ----------------------------------------------------------
>
> Key: HIVE-10631
> URL: https://issues.apache.org/jira/browse/HIVE-10631
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Affects Versions: 0.13.0, 1.0.0
> Reporter: Dongwook Kwon
> Priority: Minor
>
> HiveMetaStore.create_table_core method calls
> MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather
> is on, however for partitioned table, this updateUnpartitionedTableStatsFast
> call scanning warehouse dir and doesn't seem to use it.
> "Fast Stats" was implemented by HIVE-3959
> https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
> From create_table_core method
> {code}
> if (HiveConf.getBoolVar(hiveConf,
> HiveConf.ConfVars.HIVESTATSAUTOGATHER) &&
> !MetaStoreUtils.isView(tbl)) {
> if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table
> MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh,
> madeDir);
> } else { // Partitioned table with no partitions.
> MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh,
> true);
> }
> }
> {code}
> Particularly Line 1363: // Partitioned table with no partitions.
> {code}
> MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
> {code}
> This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and
> do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to
> newDir flag is always true
> Impact of this bug is minor with HDFS warehouse
> location(hive.metastore.warehouse.dir), it could be big with S3 warehouse
> location especially for large existing partitions.
> Also the impact is heighten with HIVE-6727 when warehouse location is S3,
> basically it could scan wrong S3 directory recursively and do nothing with
> it. I will add more detail of cases in comments
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)