I am very supportive of this donation. I know of at least one other
DataFusion-based project, blaze-rs[1], which has the same design goal and
bringing this project into the ASF may help consolidate these efforts

As Andy said, I believe it was very valuable to have a major consumer
project (e.g. DataFusion) to help drive the definition and implementation
of arrow-rs implementation. We never achieved the same synergy with
Ballista and DataFusion but I think it is more likely with a more actively
maintained Spark accelerator.

I am not sure it affects this discussion, but the Gluten project, based on
Velox, was accepted yesterday[2] into the Apache Incubator[2].  While the
functionality may be similar, the technology (Rust vs C/C++) and the
communities are different so having both in the same (big) tent of the ASF
doesn't seem concerning to me.

Also, as Chao says, I think this new sub project would naturally move to a
new DataFusion top level project when we get there (we plan a proposed
resolution April ASF board meeting)

Looking forward to seeing more!
Andrew

[1]: https://github.com/blaze-init/blaze
[2]: https://lists.apache.org/thread/6lrozds10jn9gknj9rf74lqbh7j55pq6

On Wed, Jan 10, 2024 at 5:10 PM Andy Grove <andygrov...@gmail.com> wrote:

> Hi Chao,
>
> This sounds like a really interesting project. I am interested in seeing
> how it compares to Spark RAPIDS (the project that I work on at NVIDIA) and
> Intel's Gluten project (that works with Velox).
>
> I can see the following benefits of having this project being under Apache
> Arrow governance:
>
> - Assuming that this is a drop-in replacement that doesn't require users to
> change their code (as I imagine is the case), then it could lead to greater
> adoption of DataFusion, especially for more demanding use cases where
> processing on a single node is not possible.
> - Given that it has a deep integration with the Rust implementation of
> Arrow as well as DataFusion, and given the overlap of committers between
> these projects, having them under the same governance and communication
> channels will generally be more efficient than if this project is separate.
> - Hopefully this leads to more upstream contributions to DataFusion,
> perhaps even allowing other projects such as Ballista to benefit from
> Spark-compatible operators and expressions in the future.
> - Having another project that uses DataFusion as a dependency could help
> with stabilizing the public APIs and generally driving more innovation.
>
> Given these points, I would be supportive of a donation. I see it as being
> similar to the Ballista project, which is already part of Arrow (and we
> plan to move along with DataFusion once it becomes a top-level project).
>
> Thanks,
>
> Andy.
>
> On Wed, Jan 10, 2024 at 2:28 PM Chao Sun <sunc...@apache.org> wrote:
>
> > Hi all,
> >
> > We have been working on a native execution engine for Apache Spark
> > that is heavily based on DataFusion and Arrow. Our goal is to
> > accelerate Spark query execution via delegating Spark's physical plan
> > execution to DataFusion's highly modular execution framework, while
> > still maintaining the same semantics to Spark users (i.e., no Spark
> > behavior change from the end users' point of view). Several of us are
> > Spark and/or Arrow committers. At the moment, the project is under
> > active development and not yet feature complete. However, some of the
> > existing functionalities are relatively mature and have been put in
> > production for a while now.
> >
> > Given the current momentum towards accelerating Spark through native
> > vectorized execution, we believe open sourcing this work will benefit
> > other Spark users too. In addition, we think the project itself can
> > also leverage the vibrant and strong community behind Arrow and
> > DataFusion, and grow faster. Because of this, we are exploring the
> > possibility of contributing this project to the Apache Software
> > Foundation (ASF) under the Apache Arrow project umbrella.
> >
> > We'd very much like to hear your opinion on this. Thanks.
> >
> > Best,
> > Chao
> >
>

Reply via email to