[
https://issues.apache.org/jira/browse/HIVE-26788?focusedWorklogId=830976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-830976
]
ASF GitHub Bot logged work on HIVE-26788:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Dec/22 09:58
Start Date: 05/Dec/22 09:58
Worklog Time Spent: 10m
Work Description: SourabhBadhya commented on code in PR #3812:
URL: https://github.com/apache/hive/pull/3812#discussion_r1039378601
##########
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/StatsUpdater.java:
##########
@@ -73,6 +69,9 @@ public void gatherStats(CompactionInfo ci, HiveConf conf,
String userName, Strin
sb.append(")");
}
sb.append(" compute statistics");
+ if (ci.isMinorCompaction()) {
+ sb.append(" noscan");
Review Comment:
Minor compaction is expected to not compact too many files and hence in most
scenarios only the number of files gets changed after minor compaction. Whereas
large updates like major compaction needs to update all statistics (since it
happens once in a while) to keep the metadata updated. Therefore the idea was
to do a fast update of statistics on a minor compaction & do complete update in
case of major compaction.
Issue Time Tracking
-------------------
Worklog Id: (was: 830976)
Time Spent: 1h (was: 50m)
> Update stats of table/partition after minor compaction using noscan operation
> -----------------------------------------------------------------------------
>
> Key: HIVE-26788
> URL: https://issues.apache.org/jira/browse/HIVE-26788
> Project: Hive
> Issue Type: Improvement
> Reporter: Sourabh Badhya
> Assignee: Sourabh Badhya
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Currently, statistics are not updated for minor compaction since minor
> compaction performs little updates on the statistics (such as number of files
> in table/partition & total size of the table/partition). It is better to
> utilize NOSCAN operation for minor compaction since NOSCAN operations
> performs faster update of statistics and updates the relevant fields such as
> number of files & total sizes of the table/partitions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)