adriangb opened a new issue, #17114:
URL: https://github.com/apache/datafusion/issues/17114

   While working on https://github.com/apache/datafusion/pull/16589 we came to 
the realization that there is now 2 paths of casting / adaptation logic:
   1. `SchemaAdapter` which now supports nested structs as of 
https://github.com/apache/datafusion/pull/16371
   2. The `Cast` expr (i.e. `select 1::text` in SQL or implicit casts) which 
uses the arrow cast kernel which does _not_ support nested structs and such
   
   It would be good to unify these.
   
   There was discussion of this very point in 
https://github.com/apache/arrow-rs/issues/7176 and one thing that came up was 
to have arrow develop some sort of `SchemaAdapter` for itself.
   
   One of the important issues to consider here in terms of performance, and 
maybe something to have a broader discussion on, is that one of the advantages 
of SchemaAdapter is that it can pre-compute the work to do be done and then 
avoid any sort of introspection in the hot path. This is not possible with a 
PhysicalExpr.
   
   Thus I would like to propose the following rough course of action:
   1. Unify the code paths, this can be something as naive as dynamically 
building a `SchemaAdapter` each time a `Cast` PhysicalExpr gets called or could 
be something like refactoring the code to be shared.
   2. Think about some sort of `PhysicalExpr::optimize(inputs)` that can in 
this case pre-compute the needed casts and build efficient data structures to 
apply those in a loop. I think this could benefit a lot of other expressions as 
well that need to do prep work for each execution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to