Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Andy Grove Wed, 09 Jan 2019 06:18:49 -0800

Hi Neville,

Thanks for the support.


DataFrame and SQL are two different ways of building a logical query plan
and it makes sense that they should both build the same type of plan
without code duplication. It is also sometimes beneficial to mix and match
DataFrame and SQL operations (as per Apache Spark). I agree that this work
will help drive requirements for primitive operations which can be pushed
down into the core code.

Thanks,

Andy.

On Tue, Jan 8, 2019 at 8:07 AM Neville Dipale <[email protected]> wrote:

> Hi Andy,
>
> I can't comment on the voting process, but regarding the addition of
> DataFusion:
>
> I support the idea to donate the code, mainly as I think that will help us
> accelerate some work on Rust. Out of curiousity, I've been prototying a
> 'Rust dataframe' abstraction which (can/will) have various scalar,
> aggregation, array and window functions.
>
> I'm doing this trying to put on the hat of someone wanting to use Rust in
> their binary or library. I'm already finding some things that might be
> *core* but are still not yet implemented. The presence of array_ops is also
> helpful because in addition to an efficient in-memory rep of data, they
> enable one to do some basic data manipulation on such data.
>
> Having DataFusion added to Arrow could help fill some gaps in our codebase;
> and I'm willing to work there.
>
> Regards
> Neville
>
> On Tue, 8 Jan 2019 at 16:14, Andy Grove <[email protected]> wrote:
>
> > Bumping this thread ... I know everyone is busy with getting the 0.12
> > release out, but would be good to know the process for raising this for a
> > vote. However, given the lack of comments on this thread I'm starting to
> > suspect that maybe there isn't much of an appetite for this, which is
> fine,
> > but would be good to find out for sure.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Mon, Jan 7, 2019 at 1:03 PM Andy Grove <[email protected]> wrote:
> >
> > > Thanks, Ted!
> > >
> > > I wish I'd been a bit more specific about my ask in the original
> email...
> > > I guess my question (for Wes?) is what is the process to raise this
> for a
> > > vote?
> > >
> > > Andy.
> > >
> > >
> > >
> > > On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <[email protected]>
> > wrote:
> > >
> > >> Cool!
> > >>
> > >>
> > >>
> > >> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <[email protected]>
> > wrote:
> > >>
> > >> > I'm starting a new thread for this discussion (this was previously
> > >> > discussed in the Rust Roadmap thread).
> > >> >
> > >> > The reason I got involved with Arrow is that I have been working on
> > >> > DataFusion[1] which is currently an in-process SQL query engine on
> top
> > >> of
> > >> > Arrow. It allows queries to be executed against the Arrow CSV reader
> > >> (and
> > >> > will shortly support the Arrow Parquet reader too) and presents
> > results
> > >> as
> > >> > a sequence of RecordBatch instances.
> > >> >
> > >> > I would like to donate this code to the Arrow project so that Arrow
> > has
> > >> a
> > >> > Rust-native query execution engine built in and to accelerate
> > >> development
> > >> > of this capability.
> > >> >
> > >> > I have a fairly detailed roadmap[2] in mind for the project and it
> > could
> > >> > eventually become a standalone project potentially (under ASF
> still).
> > >> >
> > >> > I don't know what the process is to vote on this, so wanted to
> discuss
> > >> that
> > >> > in this thread first.
> > >> >
> > >> > References:
> > >> >
> > >> > [1] DataFusion: https://github.com/andygrove/datafusion
> > >> > [2] Roadmap:
> > >> > https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Andy.
> > >> >
> > >>
> > >
> >
>

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Reply via email to