Re: [PR] docs: lead README with the Arrow-native framing [datafusion-comet]

via GitHub Wed, 10 Jun 2026 13:49:33 -0700


comphead commented on code in PR #4428:
URL: https://github.com/apache/datafusion-comet/pull/4428#discussion_r3391419669



##########
README.md:
##########
@@ -58,17 +60,22 @@ See the [Comet Benchmarking 
Guide](https://datafusion.apache.org/comet/contribut
 
 ## What Comet Accelerates
 
-Comet replaces Spark operators and expressions with native Rust 
implementations that run on Apache DataFusion.
-It uses Apache Arrow for zero-copy data transfer between the JVM and native 
code.
+Comet replaces Spark operators and expressions with implementations that 
consume and produce Apache Arrow
+batches. Most run as native Rust code on top of Apache DataFusion; some run as 
JVM code over Arrow batches.
+Either way, query execution stays in the Comet pipeline without falling back 
to Spark's row-based engine.

Review Comment:
   Comet accelerates Spark workloads by replacing Spark operators and 
expressions with high-performance implementations that process Apache Arrow 
columnar data directly. Most operators are powered by native Rust execution 
built on Apache DataFusion, while others run efficiently in the JVM on Arrow 
batches. This unified columnar execution model keeps processing within the 
Comet engine end-to-end, reducing overhead and delivering faster, more 
efficient query execution without reverting to Spark's traditional row-based 
engine.



##########
README.md:
##########
@@ -58,17 +60,22 @@ See the [Comet Benchmarking 
Guide](https://datafusion.apache.org/comet/contribut
 
 ## What Comet Accelerates
 
-Comet replaces Spark operators and expressions with native Rust 
implementations that run on Apache DataFusion.
-It uses Apache Arrow for zero-copy data transfer between the JVM and native 
code.
+Comet replaces Spark operators and expressions with implementations that 
consume and produce Apache Arrow
+batches. Most run as native Rust code on top of Apache DataFusion; some run as 
JVM code over Arrow batches.
+Either way, query execution stays in the Comet pipeline without falling back 
to Spark's row-based engine.

Review Comment:
   Comet accelerates Spark workloads by replacing Spark operators and 
expressions with high-performance implementations that process Apache Arrow 
columnar data directly. Most operators are powered by native Rust execution 
built on Apache DataFusion, while others run efficiently in the JVM on Arrow 
batches. This unified columnar execution model keeps processing within the 
Comet engine end-to-end, reducing overhead and delivering faster, more 
efficient query execution without reverting to Spark's traditional row-based 
engine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: lead README with the Arrow-native framing [datafusion-comet]

Reply via email to