Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

via GitHub Tue, 24 Jun 2025 10:21:11 -0700


adriangb commented on PR #16461:
URL: https://github.com/apache/datafusion/pull/16461#issuecomment-3000255690


   @kosiew I'm not sure I agree with the conclusions there. Why can't we use 
expressions to do the schema adapting during the scan? It's very possible as 
@alamb pointed out in 
https://github.com/apache/datafusion/pull/16461#issuecomment-2997870791 to feed 
a RecordBatch into a an expression and get back a new array. So unless I'm 
missing something I don't think these are correct:
   
   > Expression rewriting is great for pushdown but batch-level adapters are 
needed for correct, shaped data.
   > No effect on RecordBatch structure.
   > Limited scope (only predicates and pruning).
   
   > Possibly poorer performance due to repeated expression rewrites.
   There's no more expression rewrites than there are SchemaAdapters created. 
Those aren't cached either and are created for each file.
   
   I'll put together an example to show how predicate rewrites can be used to 
reshape data. But also FWIW that's exactly how ProjectionExec works.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] adapt filter expressions to file schema during parquet scan [datafusion]

Reply via email to