[
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=856663&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-856663
]
ASF GitHub Bot logged work on HIVE-27158:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Apr/23 08:17
Start Date: 13/Apr/23 08:17
Worklog Time Spent: 10m
Work Description: InvisibleProgrammer commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1165171624
##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java:
##########
@@ -245,6 +248,44 @@ default boolean canProvideBasicStatistics() {
return false;
}
+ /**
+ * Return some col statistics (Lower bounds, Upper bounds, Null value
counts, NaN, total counts) calculated by
+ * the underlying storage handler implementation.
+ * @param table
+ * @return A List of Column Statistics Objects, can be null
+ */
+ default
List<ColumnStatisticsObj>getColStatistics(org.apache.hadoop.hive.ql.metadata.Table
table) {
+ return null;
+ }
+
+ /**
+ * Set column stats for non-native tables
+ * @param table
+ * @param colStats
+ * @return boolean
+ */
+ default boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table
table,
+ List<ColumnStatistics> colStats) {
+ return false;
+ }
+
+ /**
+ * Check if the storage handler can provide col statistics.
+ * @param tbl
+ * @return true if the storage handler can supply the col statistics
+ */
+ default boolean
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+ return false;
+ }
+
+ /**
+ * Check if the storage handler can set col statistics.
+ * @return true if the storage handler can set the col statistics
+ */
+ default boolean canSetColStatistics(org.apache.hadoop.hive.ql.metadata.Table
tbl) {
Review Comment:
I don't know the good answer, I'm just thinking:
If we have a pair of methods like `canSetColStatistics` and
`setColStatistics`. Can we do that in a way that doesn't allow to call
`setColStatistics` if they cannot be set?
Issue Time Tracking
-------------------
Worklog Id: (was: 856663)
Time Spent: 10h (was: 9h 50m)
> Store hive columns stats in puffin files for iceberg tables
> -----------------------------------------------------------
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
> Issue Type: Improvement
> Reporter: Simhadri Govindappa
> Assignee: Simhadri Govindappa
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10h
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)