pvary commented on a change in pull request #2329:
URL: https://github.com/apache/iceberg/pull/2329#discussion_r593241677
##########
File path:
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -296,6 +302,13 @@ private void setHmsTableParameters(String
newMetadataLocation, Table tbl, Map<St
parameters.remove(hive_metastoreConstants.META_TABLE_STORAGE);
}
+ // Set the basic statistics
+ parameters.put(StatsSetupConst.NUM_FILES,
summary.getOrDefault(SnapshotSummary.TOTAL_DATA_FILES_PROP, "0"));
+ parameters.put(StatsSetupConst.ROW_COUNT,
summary.getOrDefault(SnapshotSummary.TOTAL_RECORDS_PROP, "0"));
+ parameters.put(StatsSetupConst.TOTAL_SIZE,
summary.getOrDefault(SnapshotSummary.TOTAL_FILE_SIZE_PROP, "0"));
+ // we don't have the uncompressed file sizes, so we use the totalSize
(size on disk) as an estimate for rawDataSize
+ parameters.put(StatsSetupConst.RAW_DATA_SIZE,
summary.getOrDefault(SnapshotSummary.TOTAL_FILE_SIZE_PROP, "0"));
Review comment:
As per my latest investigation, we might get away with filling only
`TOTAL_SIZE`, and leaving `RAW_DATA_SIZE` blank. Based on this code I expect
that we still would get rid of the file listing if the `TOTAL_SIZE` is not `0`:
https://github.com/apache/hive/blob/a0034284fe02a5012f883704fcd57652519a4cd5/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStats.java#L202
Could you please check this out?
Thanks,
Peter
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]