Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

Micah Kornfield Wed, 10 Jan 2024 13:45:33 -0800

Hi Chao,
Very cool. I think this is something that a lot of people are interested
in.  I think the main questions I have are:
1.  Would Spark itself not be a reasonable place for this work?
2.  Do you anticipate this would move with DataFusion to its own top-level
project [1] if that happens or stay within the Arrow project?


Thanks,
Micah

[1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341

On Wed, Jan 10, 2024 at 1:28 PM Chao Sun <sunc...@apache.org> wrote:

> Hi all,
>
> We have been working on a native execution engine for Apache Spark
> that is heavily based on DataFusion and Arrow. Our goal is to
> accelerate Spark query execution via delegating Spark's physical plan
> execution to DataFusion's highly modular execution framework, while
> still maintaining the same semantics to Spark users (i.e., no Spark
> behavior change from the end users' point of view). Several of us are
> Spark and/or Arrow committers. At the moment, the project is under
> active development and not yet feature complete. However, some of the
> existing functionalities are relatively mature and have been put in
> production for a while now.
>
> Given the current momentum towards accelerating Spark through native
> vectorized execution, we believe open sourcing this work will benefit
> other Spark users too. In addition, we think the project itself can
> also leverage the vibrant and strong community behind Arrow and
> DataFusion, and grow faster. Because of this, we are exploring the
> possibility of contributing this project to the Apache Software
> Foundation (ASF) under the Apache Arrow project umbrella.
>
> We'd very much like to hear your opinion on this. Thanks.
>
> Best,
> Chao
>

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

Reply via email to