alexandrefimov commented on issue #14995: URL: https://github.com/apache/iceberg/issues/14995#issuecomment-4604032175
I dug a bit more into the Spark side of this. This appears to be upstream Spark behavior rather than an Iceberg pruning bug. Iceberg can prune on once Spark passes a / predicate through the scan builder, but in this case Spark keeps the escaped expression above the . That means Iceberg only sees the pushed predicates shown in the node, and there is no / predicate available for Iceberg file pruning. There is relevant Spark history here: - SPARK-33677 / apache/spark#30625 intentionally changed to skip patterns containing an escape character. - SPARK-38168 / apache/spark#35465 later attempted to handle escape characters in , but that PR was closed and the JIRA was resolved as Won't Fix. The maintainer feedback on apache/spark#35465 was that handling escaped patterns in was a code-simplicity vs performance trade-off, escape-character patterns were considered rare, and users could call / directly instead. Given that, I don't think there is an Iceberg-side fix unless Spark pushes an equivalent predicate to Iceberg. The practical workaround is to use directly. If this behavior should change, it likely needs a new Spark-side proposal that specifically calls out the DataSource V2 predicate pushdown / Iceberg file-pruning impact. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
