GitHub user vrd83 closed a discussion: DataFusion for Data Engineering in Rust?
My understanding is that DataFusion is primarily an extensible query engine for engineers looking to build database systems (Influx and so on) without reinventing the wheel. Having said that, I can see it has a Rust-based DataFrame API and SQL context at a high enough abstraction layer that it's tempting to start building Data Engineering pipelines in pure Rust. 😄 An example of something I'd love to be able to do with DataFusion (I know some of this is already possible): - With features and methods provided by DataFusion, query DeltaLake/Iceberg/MySQL/Postgres/Clickhouse/Influx and so on into DataFrame(s). - Transform the data. - Load the data into any of the above systems from DataFusion. - Should memory become a bottleneck, it's a matter of relatively simple config to use a Ray cluster for distributed computing. Ideally, the above should be possible by a Data Engineer who doesn't have database internals domain knowledge. I also appreciate that because DataFusion uses Arrow for its memory format some of this might be wishful thinking (or at least not straightforward to implement). Is there a chance DataFusion will evolve in this direction or will the focus remain on database systems? Is anyone else in the community using DataFusion in Rust for Data Engineering and if so, what is your experience? GitHub link: https://github.com/apache/datafusion/discussions/13914 ---- This is an automatically sent email for github@datafusion.apache.org. To unsubscribe, please send an email to: github-unsubscr...@datafusion.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org