Hi Andy and Jacques,

Thanks for setting the repo up. Yes we are working on cleaning up the
internal repo and preparing to open a PR in the next few days.

It's a bit difficult to retain the original commit history in the PR
though since some of them contain internal info which we need to
remove upon open sourcing. How about we just add a summary in the PR
itself, and add everyone that has contributed to it as co-author to
the PR?

Chao

On Wed, Jan 17, 2024 at 11:09 AM Jacques Nadeau <jacq...@apache.org> wrote:
>
> Hey Chao, it would be great for you to share the code some place with
> commit history. (PR to the repo that Andy made or something else.)
>
> On Mon, Jan 15, 2024 at 7:38 AM Andy Grove <andygrov...@gmail.com> wrote:
>
> > Hi Chao,
> >
> > I have created https://github.com/apache/arrow-datafusion-comet and you
> > should be able to create a PR against the repo.
> >
> > Thanks,
> >
> > Andy.
> >
> > Andy.
> >
> > On Fri, Jan 12, 2024 at 3:45 PM Chao Sun <sunc...@apache.org> wrote:
> >
> > > Thanks all for the positive support!
> > >
> > > Andy, we plan to name the project Comet (BTW if you have better
> > > suggestions please let us know). Could you help to create a repo named
> > > arrow-datafusion-comet or arrow-comet? We'll clean up our internal
> > > repo and prepare for the donation in the next few days. Thanks for the
> > > help!
> > >
> > > Best,
> > > Chao
> > >
> > >
> > >
> > > On Fri, Jan 12, 2024 at 7:09 AM Andy Grove <andygrov...@gmail.com>
> > wrote:
> > > >
> > > > I think the next step here would be to create a new repo so that Chao
> > can
> > > > create a PR for the contribution, and then we can proceed to a vote.
> > > >
> > > > Chao - do you have a proposal for the name of the project? Given that
> > > this
> > > > is being donated to Apache Arrow, the repo name will start with
> > "arrow-".
> > > > Also, given that this is more of a DataFusion sub-project, I think it
> > > would
> > > > make sense to prefix the repo name with "arrow-datafusion-" and then
> > > rename
> > > > to "datafusion-" once we move the DataFusion projects to the new
> > > top-level
> > > > project.
> > > >
> > > > If the vote passes, we must complete the IP clearance process before
> > the
> > > PR
> > > > is accepted [1].
> > > >
> > > > [1] https://incubator.apache.org/ip-clearance/
> > > >
> > > >
> > > >
> > > > On Fri, Jan 12, 2024 at 12:36 AM Albert <zinki...@gmail.com> wrote:
> > > >
> > > > > Like Andrew Lamb mentioned, blaze-rs has similar goals, I'd really be
> > > > > interested to know some comparisons when the donations are made.
> > > > > All in all, I look forward to the new native project for spark
> > > > > acceleration.
> > > > >
> > > > > On Thu, Jan 11, 2024 at 9:50 PM Andrew Lamb <al...@influxdata.com>
> > > wrote:
> > > > >
> > > > > > I am very supportive of this donation. I know of at least one other
> > > > > > DataFusion-based project, blaze-rs[1], which has the same design
> > > goal and
> > > > > > bringing this project into the ASF may help consolidate these
> > efforts
> > > > > >
> > > > > > As Andy said, I believe it was very valuable to have a major
> > consumer
> > > > > > project (e.g. DataFusion) to help drive the definition and
> > > implementation
> > > > > > of arrow-rs implementation. We never achieved the same synergy with
> > > > > > Ballista and DataFusion but I think it is more likely with a more
> > > > > actively
> > > > > > maintained Spark accelerator.
> > > > > >
> > > > > > I am not sure it affects this discussion, but the Gluten project,
> > > based
> > > > > on
> > > > > > Velox, was accepted yesterday[2] into the Apache Incubator[2].
> > > While the
> > > > > > functionality may be similar, the technology (Rust vs C/C++) and
> > the
> > > > > > communities are different so having both in the same (big) tent of
> > > the
> > > > > ASF
> > > > > > doesn't seem concerning to me.
> > > > > >
> > > > > > Also, as Chao says, I think this new sub project would naturally
> > > move to
> > > > > a
> > > > > > new DataFusion top level project when we get there (we plan a
> > > proposed
> > > > > > resolution April ASF board meeting)
> > > > > >
> > > > > > Looking forward to seeing more!
> > > > > > Andrew
> > > > > >
> > > > > > [1]: https://github.com/blaze-init/blaze
> > > > > > [2]:
> > > https://lists.apache.org/thread/6lrozds10jn9gknj9rf74lqbh7j55pq6
> > > > > >
> > > > > > On Wed, Jan 10, 2024 at 5:10 PM Andy Grove <andygrov...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi Chao,
> > > > > > >
> > > > > > > This sounds like a really interesting project. I am interested in
> > > > > seeing
> > > > > > > how it compares to Spark RAPIDS (the project that I work on at
> > > NVIDIA)
> > > > > > and
> > > > > > > Intel's Gluten project (that works with Velox).
> > > > > > >
> > > > > > > I can see the following benefits of having this project being
> > under
> > > > > > Apache
> > > > > > > Arrow governance:
> > > > > > >
> > > > > > > - Assuming that this is a drop-in replacement that doesn't
> > require
> > > > > users
> > > > > > to
> > > > > > > change their code (as I imagine is the case), then it could lead
> > to
> > > > > > greater
> > > > > > > adoption of DataFusion, especially for more demanding use cases
> > > where
> > > > > > > processing on a single node is not possible.
> > > > > > > - Given that it has a deep integration with the Rust
> > > implementation of
> > > > > > > Arrow as well as DataFusion, and given the overlap of committers
> > > > > between
> > > > > > > these projects, having them under the same governance and
> > > communication
> > > > > > > channels will generally be more efficient than if this project is
> > > > > > separate.
> > > > > > > - Hopefully this leads to more upstream contributions to
> > > DataFusion,
> > > > > > > perhaps even allowing other projects such as Ballista to benefit
> > > from
> > > > > > > Spark-compatible operators and expressions in the future.
> > > > > > > - Having another project that uses DataFusion as a dependency
> > could
> > > > > help
> > > > > > > with stabilizing the public APIs and generally driving more
> > > innovation.
> > > > > > >
> > > > > > > Given these points, I would be supportive of a donation. I see it
> > > as
> > > > > > being
> > > > > > > similar to the Ballista project, which is already part of Arrow
> > > (and we
> > > > > > > plan to move along with DataFusion once it becomes a top-level
> > > > > project).
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Andy.
> > > > > > >
> > > > > > > On Wed, Jan 10, 2024 at 2:28 PM Chao Sun <sunc...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > We have been working on a native execution engine for Apache
> > > Spark
> > > > > > > > that is heavily based on DataFusion and Arrow. Our goal is to
> > > > > > > > accelerate Spark query execution via delegating Spark's
> > physical
> > > plan
> > > > > > > > execution to DataFusion's highly modular execution framework,
> > > while
> > > > > > > > still maintaining the same semantics to Spark users (i.e., no
> > > Spark
> > > > > > > > behavior change from the end users' point of view). Several of
> > > us are
> > > > > > > > Spark and/or Arrow committers. At the moment, the project is
> > > under
> > > > > > > > active development and not yet feature complete. However, some
> > > of the
> > > > > > > > existing functionalities are relatively mature and have been
> > put
> > > in
> > > > > > > > production for a while now.
> > > > > > > >
> > > > > > > > Given the current momentum towards accelerating Spark through
> > > native
> > > > > > > > vectorized execution, we believe open sourcing this work will
> > > benefit
> > > > > > > > other Spark users too. In addition, we think the project itself
> > > can
> > > > > > > > also leverage the vibrant and strong community behind Arrow and
> > > > > > > > DataFusion, and grow faster. Because of this, we are exploring
> > > the
> > > > > > > > possibility of contributing this project to the Apache Software
> > > > > > > > Foundation (ASF) under the Apache Arrow project umbrella.
> > > > > > > >
> > > > > > > > We'd very much like to hear your opinion on this. Thanks.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Chao
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > ~~~~~~~~~~~~~~~
> > > > > no mistakes
> > > > > ~~~~~~~~~~~~~~~~~~
> > > > >
> > >
> >

Reply via email to