szehon-ho commented on issue #2326: URL: https://github.com/apache/iceberg/issues/2326#issuecomment-798797152
Nice, I think with aggregate pushdown it could potentially open many faster queries in Iceberg overall, given how much metadata we have, if we are ok to answer queries with them. I think #2182 is interesting, but it might be true that it's not worth the cost if it adds a ton more time for each commit. @aokolnychyi I gave a try today for doing equivalent query on files table, it's much faster (~minute vs ~10s of minutes). I even added distinct in the end and it does not add much time. It's a shame that users first try partitions table and not files table then. I guess there's not much we can do unless we have this support? By the way as Russell was pointing to me, I was looking at making an improvement by adding predicate pushdown using ManifestEvaluator to filter out manifest-files, as the Manifest List has each manifest-file's partition min/max. If I understand correctly, it requires converting a filter on the "partition" table (partition.part_field = x) to a ManifestGroup "partition filter" (part_field = x). Now I think if this functionality will be compatible with later rewriting partitions table to use view of files table (I guess, at that point, we make the equivalent pushdown on file-table using 'partition.x' field which is not done today there either). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
