nastra commented on PR #14234:
URL: https://github.com/apache/iceberg/pull/14234#issuecomment-4479470244

   > Hi @nastra, could you please clarify where we would store NDVs? Apologies 
if this is already covered in a design doc I may have missed. Also, would it be 
possible to extend the design to optionally support partition-level statistics 
structures such as bitmap-based sketches for NDV estimation and histograms 
(e.g., KLL sketches)? Thank you!
   
   @deniskuzZ those type of stats are not handled by this design as those are 
stored separately in Puffin files (you might want to take a look at 
`NDVSketchUtil`). Those are then e.g. later used by Spark in 
`SparkScan#estimateStatistics`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to