I just wanted to introduce myself to the group before I start asking lots
of questions. I'm a software engineer mostly working with
Scala/Spark/Kudu/Parquet in my day job and in my spare time I have been
working on a POC of a distributed data platform implemented in Rust. The
project is called DataFusion (https://www.datafusion.rs/).
The project is very early and the implementation is currently very simple
row-based processing but the performance is already quite exciting to me
(current test case is 4x faster than Apache Spark).
I have decided that I should now concentrate on making Apache Arrow the
native memory format so that I can implement more efficient data processing
and make it easier in the future to be able to integrate with things like
Kudu and Parquet. It's also just a great way for me to learn about
I'm just in the process of getting Arrow compiling and reading the docs.
I'll be back soon with questions I'm sure.