deniskuzZ commented on code in PR #8202:
URL: https://github.com/apache/iceberg/pull/8202#discussion_r1941767628


##########
format/puffin-spec.md:
##########
@@ -181,6 +181,23 @@ for Puffin v1.
 [roaring-bitmap-portable-serialization]: 
https://github.com/RoaringBitmap/RoaringFormatSpec?tab=readme-ov-file#extension-for-64-bit-implementations
 [roaring-bitmap-general-layout]: 
https://github.com/RoaringBitmap/RoaringFormatSpec?tab=readme-ov-file#general-layout
 
+#### `hive-column-statistics-obj` blob type
+
+A serialized form of Hive ColumnStatsObject.
+
+The ColumnStatsObject supports Histograms, NDV, Min and Max values, Number of 
nulls, Number of trues, column name, type.
+A full list of supported statistics is listed in the table here:
+[ColumnStatistics](https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ColumnStatistics)

Review Comment:
   hi @rdblue,
   thanks for checking this PR!
    
   `The partition statistics files provide a way to aggregate those beyond the 
file level` 
   Does iceberg provide build-in support to get an aggregated Column stats? I 
mean, is there some library/service that generates partition files with an 
aggregated column stats? 
   AFAIK we only do this for basic stats : 
https://github.com/apache/iceberg/pull/11216
    
   If yes, could you please point me to the code where is that done? I had an 
impression that from colstats only NDV is calculated and stored in partition 
files.
   
   How about:
   1. bitvectors - used to improve stats estimations for IN operator
   2. histogram - histogram statistics, which are particularly useful for 
skewed data and range predicates (KLL data sketches)
   3. numTrue/numFalse
   4. avgColLen



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to