Hello Andy, one thing that we had in discussions in the past and also opened me up a bit to the parquet-cpp merge is that merging code into a repo doesn't mean that it will reside always there. Apache has the infrastructure and guidelines to split a part of a project into a separate one. This is how the Java part of Arrow came historically out of Drill and other projects like Calcite have spin-offs like Avatica that turn(ed) into their own projects (you should also have a look at Calcite, this might be useful for DataFusion).
Thus, if there is further interest in the Arrow Rust community, you should definitely think whether some of this code *currently* would develop faster if it would be part of the Arrow repo. If there is a separate community around it in future and not anymore tightly coupled with the core Arrow developmen: make it a separate project again. Cheers, Uwe On Sun, Jan 6, 2019, at 7:33 AM, Andy Grove wrote: > Hi Wes, > > Yes, I have a SQL parser (actually this is a separate crate) and DataFusion > has the query planner and execution engine. Here is a blog post from last > summer with some performance comparisons with Apache Spark: > > https://andygrove.io/2018/05/datafusion-aggregate-performance/ > > I have recently been updating the code to work with my fork of Arrow and > currently it only works with CSV and not Parquet, but adding Parquet > support again will be simple once the Arrow reader is added (others are > working on this already). > > I guess I should write this up in more detail and we can open it up to a > vote here to see if there is an appetite to donate and support this code > here? > > Thanks, > > Andy.