[DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

Chao Sun Wed, 10 Jan 2024 13:28:44 -0800

Hi all,

We have been working on a native execution engine for Apache Spark
that is heavily based on DataFusion and Arrow. Our goal is to
accelerate Spark query execution via delegating Spark's physical plan
execution to DataFusion's highly modular execution framework, while
still maintaining the same semantics to Spark users (i.e., no Spark
behavior change from the end users' point of view). Several of us are
Spark and/or Arrow committers. At the moment, the project is under
active development and not yet feature complete. However, some of the
existing functionalities are relatively mature and have been put in
production for a while now.


Given the current momentum towards accelerating Spark through native
vectorized execution, we believe open sourcing this work will benefit
other Spark users too. In addition, we think the project itself can
also leverage the vibrant and strong community behind Arrow and
DataFusion, and grow faster. Because of this, we are exploring the
possibility of contributing this project to the Apache Software
Foundation (ASF) under the Apache Arrow project umbrella.

We'd very much like to hear your opinion on this. Thanks.

Best,
Chao

[DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

Reply via email to