raunaqmorarka commented on issue #14375:
URL: https://github.com/apache/iceberg/issues/14375#issuecomment-3430824480

   > It looks to me this would be better handled in a separate place. Trino 
needs a stats file rather than a partition stats file.
   
   This is not a Trino only problem, its a problem for any engine that wants to 
read table statistics for cost based optimizations in the planner. We already 
have a statistics file for per column NDV stats at the table level. However, we 
are forced to read manifest files to get all the other statistics (row count, 
column size, min, max, nulls count). Reading manifests is an expensive 
operation for large tables, or tables with lots of small manifests.
   Now that we already have a concept of partition statistics in iceberg which 
helps us to avoid reading manifests to build table statistics, it is natural to 
extend that concept to unpartitioned tables as well. An unpartitioned table can 
be thought of as a partitioned table with a single partition.
   Trino can of course create its own stats file for unpartitioned tables. 
However, that is an unsatisfactory solution as it becomes an engine specific 
thing. It also looks like poor design to have one solution for getting 
statistics for a partitioned table and an entirely different solution for 
unpartitioned tables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to