alexandrefimov commented on issue #14995:
URL: https://github.com/apache/iceberg/issues/14995#issuecomment-4604032175

   I dug a bit more into the Spark side of this.
   
   This appears to be upstream Spark behavior rather than an Iceberg pruning 
bug. Iceberg can prune on  once Spark passes a  /  predicate through the scan 
builder, but in this case Spark keeps the escaped  expression above the . That 
means Iceberg only sees the pushed predicates shown in the  node, and there is 
no / predicate available for Iceberg file pruning.
   
   There is relevant Spark history here:
   
   - SPARK-33677 / apache/spark#30625 intentionally changed  to skip patterns 
containing an escape character.
   - SPARK-38168 / apache/spark#35465 later attempted to handle escape 
characters in , but that PR was closed and the JIRA was resolved as Won't Fix.
   
   The maintainer feedback on apache/spark#35465 was that handling escaped 
patterns in  was a code-simplicity vs performance trade-off, escape-character 
patterns were considered rare, and users could call  /  directly instead.
   
   Given that, I don't think there is an Iceberg-side fix unless Spark pushes 
an equivalent predicate to Iceberg. The practical workaround is to use  
directly. If this behavior should change, it likely needs a new Spark-side 
proposal that specifically calls out the DataSource V2 predicate pushdown / 
Iceberg file-pruning impact.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to