GitHub user mauropagano closed a discussion: How does Datafusion compare with Arrow Compute functions?
Hi, Apologies if this is a trivial question but I can't seem to answer it myself. In the last 2-3 major release Arrow compute grew quite a bit including functionalities that I thought "was on Datafusion" to provide (group by, joins, exec plans, etc). I understand the interface is different (e.g. SQL is an option in Datafusion) but in Python there seem to be some overlap. Also understand the distributed nature of Ballista and the extensibility (e.g. UDF) Datafusion brings, that's a clear differentiator. How should one reason about when to use Datafusion vs straight Arrow, say for example to aggregate data from a parquet file? Or are these core operations now being provided by Arrow compute and Datafusion focuses on more higher-level operations? Thanks, Mauro GitHub link: https://github.com/apache/datafusion/discussions/3079 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
