GitHub user vrd83 closed a discussion: DataFusion for Data Engineering in Rust?

My understanding is that DataFusion is primarily an extensible query engine for 
engineers looking to build database systems (Influx and so on) without 
reinventing the wheel. 

Having said that, I can see it has a Rust-based DataFrame API and SQL context 
at a high enough abstraction layer that it's tempting to start building Data 
Engineering pipelines in pure Rust. 😄 

An example of something I'd love to be able to do with DataFusion (I know some 
of this is already possible):

- With features and methods provided by DataFusion, query 
DeltaLake/Iceberg/MySQL/Postgres/Clickhouse/Influx and so on into DataFrame(s).
- Transform the data.
- Load the data into any of the above systems from DataFusion. 
- Should memory become a bottleneck, it's a matter of relatively simple config 
to use a Ray cluster for distributed computing. 

Ideally, the above should be possible by a Data Engineer who doesn't have 
database internals domain knowledge. I also appreciate that because DataFusion 
uses Arrow for its memory format some of this might be wishful thinking (or at 
least not straightforward to implement).

Is there a chance DataFusion will evolve in this direction or will the focus 
remain on database systems? Is anyone else in the community using DataFusion in 
Rust for Data Engineering and if so, what is your experience?


GitHub link: https://github.com/apache/datafusion/discussions/13914

----
This is an automatically sent email for github@datafusion.apache.org.
To unsubscribe, please send an email to: 
github-unsubscr...@datafusion.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to