Hello Andy,

one thing that we had in discussions in the past and also opened me up a bit to 
the parquet-cpp merge is that merging code into a repo doesn't mean that it 
will reside always there. Apache has the infrastructure and guidelines to split 
a part of a project into a separate one. This is how the Java part of Arrow 
came historically out of Drill and other projects like Calcite have spin-offs 
like Avatica that turn(ed) into their own projects (you should also have a look 
at Calcite, this might be useful for DataFusion).

Thus, if there is further interest in the Arrow Rust community, you should 
definitely think whether some of this code *currently* would develop faster if 
it would be part of the Arrow repo. If there is a separate community around it 
in future and not anymore tightly coupled with the core Arrow developmen: make 
it a separate project again.

Cheers,

Uwe

On Sun, Jan 6, 2019, at 7:33 AM, Andy Grove wrote:
> Hi Wes,
> 
> Yes, I have a SQL parser (actually this is a separate crate) and DataFusion
> has the query planner and execution engine. Here is a blog post from last
> summer with some performance comparisons with Apache Spark:
> 
> https://andygrove.io/2018/05/datafusion-aggregate-performance/
> 
> I have recently been updating the code to work with my fork of Arrow and
> currently it only works with CSV and not Parquet, but adding Parquet
> support again will be simple once the Arrow reader is added (others are
> working on this already).
> 
> I guess I should write this up in more detail and we can open it up to a
> vote here to see if there is an appetite to donate and support this code
> here?
> 
> Thanks,
> 
> Andy.

Reply via email to