Re: [D] How does Datafusion compare with Arrow Compute functions? [datafusion]

via GitHub Thu, 25 Sep 2025 19:46:47 -0700


GitHub user mauropagano closed a discussion: How does Datafusion compare with 
Arrow Compute functions?


Hi,

Apologies if this is a trivial question but I can't seem to answer it myself.

In the last 2-3 major release Arrow compute grew quite a bit including 
functionalities that I thought "was on Datafusion" to provide (group by, joins, 
exec plans, etc). 

I understand the interface is different (e.g. SQL is an option in Datafusion) 
but in Python there seem to be some overlap.
Also understand the distributed nature of Ballista and the extensibility (e.g. 
UDF) Datafusion brings, that's a clear differentiator.

How should one reason about when to use Datafusion vs straight Arrow, say for 
example to aggregate data from a parquet file?
Or are these core operations now being provided by Arrow compute and Datafusion 
focuses on more higher-level operations? 

Thanks,
Mauro

GitHub link: https://github.com/apache/datafusion/discussions/3079

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] How does Datafusion compare with Arrow Compute functions? [datafusion]

Reply via email to