Blizzara opened a new issue, #630:
URL: https://github.com/apache/datafusion-comet/issues/630

   ### What is the problem the feature request solves?
   
   Hey, we're looking to use DataFusion to replace some Spark workflows, but in 
a somewhat different way than Comet - we convert Spark logical plans into 
Substrait and that into DataFusion. Our goal is still similar to Comet: to get 
equivalent-but-faster execution compared to Spark. 
   
   We've noticed some differences in the Spark and DataFusion native 
expressions, which you guys have already addressed in Comet by adding custom 
expressions that match closer to Spark's, and so we'd like to be able to reuse 
the expressions from Comet where possible. 
   
   While that works today in theory, in practice it's a bit difficult given 
dependency issues between Comet's version of DataFusion and the version we use 
(just later commit on datafusion main atm, but still), and also pulling all of 
Comet for just the expressions is a bit unnecessary. So my question is - would 
you be open to separating the expressions into e.g. `datafusion-comet-exprs` 
crate, which could have a reduced set of dependencies and so be easier to reuse 
downstream? I'd be happy to write a PR if you'd find it acceptable. 
   
   I might need to also change some functions to be exported from that crate to 
reuse them, as Comet operates mainly on the physical expr level but for our use 
it's easier to have the expressions as ScalarUDFs, if that's okay.
   
   I tested already through a fork that reusing the Comet expressions is 
possible, and it does give better compatibility with Spark :)
   
   ### Describe the potential solution
   
   Rename `core` into `datafusion-comet/core`
   Move `core/src/execution/datafusion/expressions` into `datafusion-comet/expr`
   (probably some more changes to fix cross-dependencies, maybe introduce 
`datafusion-comet/common` to share stuff if needed)
   Open for other suggestions as well!
   
   ### Additional context
   
   Related: https://github.com/apache/datafusion/issues/11201


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to