szehon-ho edited a comment on issue #2326: URL: https://github.com/apache/iceberg/issues/2326#issuecomment-798797152
Nice, I think with aggregate pushdown it could potentially open many faster queries in Iceberg overall, given how much metadata we have, if we are ok to answer queries with them. I think #2182 is interesting, but it might be true that it's not worth the cost if it adds a ton more time for each commit. @aokolnychyi I gave a try today for doing the equivalent query on the files table, and it's much faster (~minute vs ~10s of minutes). I even added distinct in the end and it does not add much time. It's a shame that users first try partitions table and not files table then for listing partitions. I guess there's not much we can do unless we have the spark view catalog support as mentioned in that PR? By the way, as Russell was pointing to me, I was looking at making an improvement by adding predicate pushdown using ManifestEvaluator to filter out manifest-files, as the Manifest List already has each manifest-file's partition min/max. If I understand correctly, it requires converting a filter on the "partition" table (partition.part_field = x) to a "partition filter" (part_field = x). Now it makes me wonder if this functionality will be compatible with later rewriting partitions table to use view of files table. I guess it would be, as we should have an equivalent partition filter on files_table as well (partition.part_field = x, which is not there today). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
