ZachDischner commented on issue #7774: URL: https://github.com/apache/iceberg/issues/7774#issuecomment-1642428231
Interesting. I ran an experiment - spun up two side-by-side EMR clusters as a controlled test. EMR 6.10 (Spark 3.3.1, Iceberg 1.1.0) and EMR 6.11 (Spark 3.3.2, Iceberg 1.2.0). Verified that the same operations on the `files` metadata tables do read the _same_ exact amount of input data, indicating to me that predicates are working. The same query (`SELECT * FROM <preamble>.files`) takes 2x as long on EMR 6.11 as it does on 6.10. So the warning about predicates not being pushed seems like a red herring - they _are_ being pushed, but the operation still takes longer. I'm now guessing that the metadata is undergoing additional processing (maybe adding `readable_metrics`?) which is causing it to take longer to get a partition's file information. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
