Thank you all for the great support and interest on this project!

On Sun, Feb 11, 2024 at 12:51 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> Congrats all! It's great to see the Arrow+DataFusion ecosystem expand in
> this way and to bring the work under the ASF umbrella.
>
> On Sun, Feb 11, 2024 at 5:02 AM Andrew Lamb <al...@influxdata.com> wrote:
>
> > As a follow up here the acceptance vote [1] has passed, the IP Clearance
> > Process is complete [2] and the code PR is merged[3]!
> >
> > It is a very exciting time! Congratulations to all involved
> >
> > Andrew
> >
> > [1]: https://lists.apache.org/thread/cyfyb96sssmpr73hhm7vh8jcdjbz8rsp
> > [2]: https://github.com/apache/arrow-datafusion-comet/pull/2
> > [3]: https://github.com/apache/arrow-datafusion-comet/pull/1
> >
> > On Wed, Jan 24, 2024 at 1:53 PM Jacques Nadeau <jacq...@apache.org> wrote:
> >
> > > For those that are interested wrt lang types/lines...
> > >
> > >
> > >
> > --------------------------------------------------------------------------------
> > > Language                      files          blank        comment
> > > code
> > >
> > >
> > --------------------------------------------------------------------------------
> > > Rust                             69           2701           2548
> > >  17154
> > > Scala                            69           2098           2595
> > >  12991
> > > Java                             41            926           1521
> > > 5505
> > > Maven                             4             71            156
> > > 1228
> > > Protocol Buffers                  3             96             65
> > >  417
> > > XML                               3             80             99
> > >  256
> > > Markdown                          5             69             80
> > >  190
> > > TOML                              2             14             38
> > >   90
> > > Bourne Shell                      1              9             39
> > >   65
> > > make                              1              5              1
> > >   62
> > > Bourne Again Shell                1             12             16
> > >   56
> > > YAML                              2              5             38
> > >   34
> > > Properties                        2              8             42
> > >   26
> > > SQL                               1              0              0
> > >    9
> > >
> > >
> > --------------------------------------------------------------------------------
> > > SUM:                            204           6094           7238
> > >  38083
> > >
> > >
> > --------------------------------------------------------------------------------
> > >
> > > On Wed, Jan 24, 2024 at 8:30 AM Chao Sun <sunc...@apache.org> wrote:
> > >
> > > > Thanks Jacques and everyone here for the feedback! We just created a
> > > > PR https://github.com/apache/arrow-datafusion-comet/pull/1 for the
> > > > donation vote and IP clearance. Please take a look there and provide
> > > > your valuable comments.
> > > >
> > > > Best,
> > > > Chao
> > > >
> > > > On Thu, Jan 18, 2024 at 5:24 PM Jacques Nadeau <jacq...@apache.org>
> > > wrote:
> > > > >
> > > > > Yes, that was roughly what I was requesting (I was suggesting a
> > single
> > > PR
> > > > > with many commits that would be merged with the history).
> > > > >
> > > > > It's hard to provide a more concrete opinion on this without seeing
> > the
> > > > > quantity and complexity of the code. If it's 5,000 lines of code, it
> > > > > probably doesn't matter. If it's 500,000, it's probably pretty
> > > important.
> > > > > If 10 active Arrow/Datafusion committers are already substantial
> > > > > contributors to the code also makes a difference versus only a fairly
> > > > > disjunct collection of people who are relatively inactive Arrow
> > > community
> > > > > members.
> > > > >
> > > > > Don't take this as lack of excitement! The potential for contribution
> > > is
> > > > > awesome and exciting!
> > > > >
> > > > > Part of making the contribution successful is making it as
> > approachable
> > > > as
> > > > > possible to the rest of the community. I just want to find every way
> > > > > possible that we can do that.
> > > > >
> > > > > Looking forward to seeing the code.
> > > > >
> > > > > On Wed, Jan 17, 2024 at 10:13 AM Chao Sun <sunc...@apache.org>
> > wrote:
> > > > >
> > > > > > Hi Jacques,
> > > > > >
> > > > > > Do you mean instead of a single PR, we modify (e.g., git commit
> > > amend)
> > > > > > all the commits that we have internally to remove any sensitive
> > > > > > information, and open PRs for them against the above repo?
> > > > > >
> > > > > > I understand this will help readability and maintenance of the
> > code,
> > > > > > but it will be a lot of work (we have ~1000 commits) and much more
> > > > > > difficult to pass our legal review (our company has pretty strict
> > > > > > policies in open source and all the commits need to be checked
> > before
> > > > > > they can go outside). In addition, we already carefully added
> > plenty
> > > > > > of comments in the codebase for things that require non-trivial
> > > > > > efforts to understand.
> > > > > >
> > > > > > Given that all of our team members will be actively maintaining and
> > > > > > contributing to this project (since it's being widely used
> > internally
> > > > > > already), we'd be happy to help further improve readability &
> > > > > > maintainability of the codebase and resolving issues raised from
> > the
> > > > > > community. Will this work for you? really appreciate if you
> > > understand
> > > > > > our situation.
> > > > > >
> > > > > > Thanks,
> > > > > > Chao
> > > > > >
> > > > > > On Wed, Jan 17, 2024 at 11:30 AM Jacques Nadeau <
> > jacq...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > Thanks for the quick response Chao.
> > > > > > >
> > > > > > > My experience on these things is that maintaining commit history
> > > for
> > > > > > large
> > > > > > > codebases can be invaluable for tracking down issues. (Hey, why
> > is
> > > > this
> > > > > > > code written this way-- oh, it was part of x patch that was
> > trying
> > > to
> > > > > > > achieve y).
> > > > > > >
> > > > > > > In the past, I've used git commit replay type tools and filtering
> > > of
> > > > > > commit
> > > > > > > messages, subdirectories, etc. to get something prepped for
> > > external
> > > > > > > consumption. My experience is that spending a few days now to do
> > > this
> > > > > > kind
> > > > > > > of thing saves far more days in the future (and leads to higher
> > > > quality).
> > > > > > >
> > > > > > > On Wed, Jan 17, 2024 at 9:18 AM Chao Sun <sunc...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Andy and Jacques,
> > > > > > > >
> > > > > > > > Thanks for setting the repo up. Yes we are working on cleaning
> > up
> > > > the
> > > > > > > > internal repo and preparing to open a PR in the next few days.
> > > > > > > >
> > > > > > > > It's a bit difficult to retain the original commit history in
> > the
> > > > PR
> > > > > > > > though since some of them contain internal info which we need
> > to
> > > > > > > > remove upon open sourcing. How about we just add a summary in
> > the
> > > > PR
> > > > > > > > itself, and add everyone that has contributed to it as
> > co-author
> > > to
> > > > > > > > the PR?
> > > > > > > >
> > > > > > > > Chao
> > > > > > > >
> > > > > > > > On Wed, Jan 17, 2024 at 11:09 AM Jacques Nadeau <
> > > > jacq...@apache.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hey Chao, it would be great for you to share the code some
> > > place
> > > > with
> > > > > > > > > commit history. (PR to the repo that Andy made or something
> > > > else.)
> > > > > > > > >
> > > > > > > > > On Mon, Jan 15, 2024 at 7:38 AM Andy Grove <
> > > > andygrov...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Chao,
> > > > > > > > > >
> > > > > > > > > > I have created
> > > > https://github.com/apache/arrow-datafusion-comet
> > > > > > and
> > > > > > > > you
> > > > > > > > > > should be able to create a PR against the repo.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Andy.
> > > > > > > > > >
> > > > > > > > > > Andy.
> > > > > > > > > >
> > > > > > > > > > On Fri, Jan 12, 2024 at 3:45 PM Chao Sun <
> > sunc...@apache.org
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks all for the positive support!
> > > > > > > > > > >
> > > > > > > > > > > Andy, we plan to name the project Comet (BTW if you have
> > > > better
> > > > > > > > > > > suggestions please let us know). Could you help to
> > create a
> > > > repo
> > > > > > > > named
> > > > > > > > > > > arrow-datafusion-comet or arrow-comet? We'll clean up our
> > > > > > internal
> > > > > > > > > > > repo and prepare for the donation in the next few days.
> > > > Thanks
> > > > > > for
> > > > > > > > the
> > > > > > > > > > > help!
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Chao
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 12, 2024 at 7:09 AM Andy Grove <
> > > > > > andygrov...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > I think the next step here would be to create a new
> > repo
> > > so
> > > > > > that
> > > > > > > > Chao
> > > > > > > > > > can
> > > > > > > > > > > > create a PR for the contribution, and then we can
> > proceed
> > > > to a
> > > > > > > > vote.
> > > > > > > > > > > >
> > > > > > > > > > > > Chao - do you have a proposal for the name of the
> > > project?
> > > > > > Given
> > > > > > > > that
> > > > > > > > > > > this
> > > > > > > > > > > > is being donated to Apache Arrow, the repo name will
> > > start
> > > > with
> > > > > > > > > > "arrow-".
> > > > > > > > > > > > Also, given that this is more of a DataFusion
> > > sub-project,
> > > > I
> > > > > > think
> > > > > > > > it
> > > > > > > > > > > would
> > > > > > > > > > > > make sense to prefix the repo name with
> > > > "arrow-datafusion-" and
> > > > > > > > then
> > > > > > > > > > > rename
> > > > > > > > > > > > to "datafusion-" once we move the DataFusion projects
> > to
> > > > the
> > > > > > new
> > > > > > > > > > > top-level
> > > > > > > > > > > > project.
> > > > > > > > > > > >
> > > > > > > > > > > > If the vote passes, we must complete the IP clearance
> > > > process
> > > > > > > > before
> > > > > > > > > > the
> > > > > > > > > > > PR
> > > > > > > > > > > > is accepted [1].
> > > > > > > > > > > >
> > > > > > > > > > > > [1] https://incubator.apache.org/ip-clearance/
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jan 12, 2024 at 12:36 AM Albert <
> > > > zinki...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Like Andrew Lamb mentioned, blaze-rs has similar
> > goals,
> > > > I'd
> > > > > > > > really be
> > > > > > > > > > > > > interested to know some comparisons when the
> > donations
> > > > are
> > > > > > made.
> > > > > > > > > > > > > All in all, I look forward to the new native project
> > > for
> > > > > > spark
> > > > > > > > > > > > > acceleration.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jan 11, 2024 at 9:50 PM Andrew Lamb <
> > > > > > > > al...@influxdata.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I am very supportive of this donation. I know of at
> > > > least
> > > > > > one
> > > > > > > > other
> > > > > > > > > > > > > > DataFusion-based project, blaze-rs[1], which has
> > the
> > > > same
> > > > > > > > design
> > > > > > > > > > > goal and
> > > > > > > > > > > > > > bringing this project into the ASF may help
> > > consolidate
> > > > > > these
> > > > > > > > > > efforts
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As Andy said, I believe it was very valuable to
> > have
> > > a
> > > > > > major
> > > > > > > > > > consumer
> > > > > > > > > > > > > > project (e.g. DataFusion) to help drive the
> > > definition
> > > > and
> > > > > > > > > > > implementation
> > > > > > > > > > > > > > of arrow-rs implementation. We never achieved the
> > > same
> > > > > > synergy
> > > > > > > > with
> > > > > > > > > > > > > > Ballista and DataFusion but I think it is more
> > likely
> > > > with
> > > > > > a
> > > > > > > > more
> > > > > > > > > > > > > actively
> > > > > > > > > > > > > > maintained Spark accelerator.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I am not sure it affects this discussion, but the
> > > > Gluten
> > > > > > > > project,
> > > > > > > > > > > based
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > Velox, was accepted yesterday[2] into the Apache
> > > > > > Incubator[2].
> > > > > > > > > > > While the
> > > > > > > > > > > > > > functionality may be similar, the technology (Rust
> > vs
> > > > > > C/C++)
> > > > > > > > and
> > > > > > > > > > the
> > > > > > > > > > > > > > communities are different so having both in the
> > same
> > > > (big)
> > > > > > > > tent of
> > > > > > > > > > > the
> > > > > > > > > > > > > ASF
> > > > > > > > > > > > > > doesn't seem concerning to me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Also, as Chao says, I think this new sub project
> > > would
> > > > > > > > naturally
> > > > > > > > > > > move to
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > new DataFusion top level project when we get there
> > > (we
> > > > > > plan a
> > > > > > > > > > > proposed
> > > > > > > > > > > > > > resolution April ASF board meeting)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Looking forward to seeing more!
> > > > > > > > > > > > > > Andrew
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1]: https://github.com/blaze-init/blaze
> > > > > > > > > > > > > > [2]:
> > > > > > > > > > >
> > > > https://lists.apache.org/thread/6lrozds10jn9gknj9rf74lqbh7j55pq6
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jan 10, 2024 at 5:10 PM Andy Grove <
> > > > > > > > andygrov...@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Chao,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This sounds like a really interesting project. I
> > am
> > > > > > > > interested in
> > > > > > > > > > > > > seeing
> > > > > > > > > > > > > > > how it compares to Spark RAPIDS (the project
> > that I
> > > > work
> > > > > > on
> > > > > > > > at
> > > > > > > > > > > NVIDIA)
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > Intel's Gluten project (that works with Velox).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I can see the following benefits of having this
> > > > project
> > > > > > being
> > > > > > > > > > under
> > > > > > > > > > > > > > Apache
> > > > > > > > > > > > > > > Arrow governance:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Assuming that this is a drop-in replacement
> > that
> > > > > > doesn't
> > > > > > > > > > require
> > > > > > > > > > > > > users
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > change their code (as I imagine is the case),
> > then
> > > it
> > > > > > could
> > > > > > > > lead
> > > > > > > > > > to
> > > > > > > > > > > > > > greater
> > > > > > > > > > > > > > > adoption of DataFusion, especially for more
> > > > demanding use
> > > > > > > > cases
> > > > > > > > > > > where
> > > > > > > > > > > > > > > processing on a single node is not possible.
> > > > > > > > > > > > > > > - Given that it has a deep integration with the
> > > Rust
> > > > > > > > > > > implementation of
> > > > > > > > > > > > > > > Arrow as well as DataFusion, and given the
> > overlap
> > > of
> > > > > > > > committers
> > > > > > > > > > > > > between
> > > > > > > > > > > > > > > these projects, having them under the same
> > > > governance and
> > > > > > > > > > > communication
> > > > > > > > > > > > > > > channels will generally be more efficient than if
> > > > this
> > > > > > > > project is
> > > > > > > > > > > > > > separate.
> > > > > > > > > > > > > > > - Hopefully this leads to more upstream
> > > > contributions to
> > > > > > > > > > > DataFusion,
> > > > > > > > > > > > > > > perhaps even allowing other projects such as
> > > > Ballista to
> > > > > > > > benefit
> > > > > > > > > > > from
> > > > > > > > > > > > > > > Spark-compatible operators and expressions in the
> > > > future.
> > > > > > > > > > > > > > > - Having another project that uses DataFusion as
> > a
> > > > > > dependency
> > > > > > > > > > could
> > > > > > > > > > > > > help
> > > > > > > > > > > > > > > with stabilizing the public APIs and generally
> > > > driving
> > > > > > more
> > > > > > > > > > > innovation.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Given these points, I would be supportive of a
> > > > donation.
> > > > > > I
> > > > > > > > see it
> > > > > > > > > > > as
> > > > > > > > > > > > > > being
> > > > > > > > > > > > > > > similar to the Ballista project, which is already
> > > > part of
> > > > > > > > Arrow
> > > > > > > > > > > (and we
> > > > > > > > > > > > > > > plan to move along with DataFusion once it
> > becomes
> > > a
> > > > > > > > top-level
> > > > > > > > > > > > > project).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Andy.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jan 10, 2024 at 2:28 PM Chao Sun <
> > > > > > sunc...@apache.org
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We have been working on a native execution
> > engine
> > > > for
> > > > > > > > Apache
> > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > that is heavily based on DataFusion and Arrow.
> > > Our
> > > > > > goal is
> > > > > > > > to
> > > > > > > > > > > > > > > > accelerate Spark query execution via delegating
> > > > Spark's
> > > > > > > > > > physical
> > > > > > > > > > > plan
> > > > > > > > > > > > > > > > execution to DataFusion's highly modular
> > > execution
> > > > > > > > framework,
> > > > > > > > > > > while
> > > > > > > > > > > > > > > > still maintaining the same semantics to Spark
> > > users
> > > > > > (i.e.,
> > > > > > > > no
> > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > behavior change from the end users' point of
> > > view).
> > > > > > > > Several of
> > > > > > > > > > > us are
> > > > > > > > > > > > > > > > Spark and/or Arrow committers. At the moment,
> > the
> > > > > > project
> > > > > > > > is
> > > > > > > > > > > under
> > > > > > > > > > > > > > > > active development and not yet feature
> > complete.
> > > > > > However,
> > > > > > > > some
> > > > > > > > > > > of the
> > > > > > > > > > > > > > > > existing functionalities are relatively mature
> > > and
> > > > have
> > > > > > > > been
> > > > > > > > > > put
> > > > > > > > > > > in
> > > > > > > > > > > > > > > > production for a while now.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Given the current momentum towards accelerating
> > > > Spark
> > > > > > > > through
> > > > > > > > > > > native
> > > > > > > > > > > > > > > > vectorized execution, we believe open sourcing
> > > this
> > > > > > work
> > > > > > > > will
> > > > > > > > > > > benefit
> > > > > > > > > > > > > > > > other Spark users too. In addition, we think
> > the
> > > > > > project
> > > > > > > > itself
> > > > > > > > > > > can
> > > > > > > > > > > > > > > > also leverage the vibrant and strong community
> > > > behind
> > > > > > > > Arrow and
> > > > > > > > > > > > > > > > DataFusion, and grow faster. Because of this,
> > we
> > > > are
> > > > > > > > exploring
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > possibility of contributing this project to the
> > > > Apache
> > > > > > > > Software
> > > > > > > > > > > > > > > > Foundation (ASF) under the Apache Arrow project
> > > > > > umbrella.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We'd very much like to hear your opinion on
> > this.
> > > > > > Thanks.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Chao
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > ~~~~~~~~~~~~~~~
> > > > > > > > > > > > > no mistakes
> > > > > > > > > > > > > ~~~~~~~~~~~~~~~~~~
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >

Reply via email to