raunaqmorarka commented on issue #14375: URL: https://github.com/apache/iceberg/issues/14375#issuecomment-3430824480
> It looks to me this would be better handled in a separate place. Trino needs a stats file rather than a partition stats file. This is not a Trino only problem, its a problem for any engine that wants to read table statistics for cost based optimizations in the planner. We already have a statistics file for per column NDV stats at the table level. However, we are forced to read manifest files to get all the other statistics (row count, column size, min, max, nulls count). Reading manifests is an expensive operation for large tables, or tables with lots of small manifests. Now that we already have a concept of partition statistics in iceberg which helps us to avoid reading manifests to build table statistics, it is natural to extend that concept to unpartitioned tables as well. An unpartitioned table can be thought of as a partitioned table with a single partition. Trino can of course create its own stats file for unpartitioned tables. However, that is an unsatisfactory solution as it becomes an engine specific thing. It also looks like poor design to have one solution for getting statistics for a partitioned table and an entirely different solution for unpartitioned tables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
