Re: [PR] blog: Apache DataFusion Comet 0.17.0 release post [datafusion-site]

via GitHub Mon, 22 Jun 2026 06:29:27 -0700


mbutrovich commented on code in PR #198:
URL: https://github.com/apache/datafusion-site/pull/198#discussion_r3452529471



##########
content/blog/2026-06-20-datafusion-comet-0.17.0.md:
##########
@@ -0,0 +1,245 @@
+---
+layout: post
+title: Apache DataFusion Comet 0.17.0 Release
+date: 2026-06-20
+author: pmc
+categories: [subprojects]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+The Apache DataFusion PMC is pleased to announce version 0.17.0 of the 
[Comet](https://datafusion.apache.org/comet/) subproject.
+
+This release covers approximately five weeks of development work and is the 
result of merging 192 PRs from 19
+contributors. See the [change log] for more information.
+
+[change log]: 
https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.17.0.md
+
+## Fewer Fallbacks to Spark
+
+The headline feature of 0.17.0 is a new mechanism that keeps more of your 
query running inside Comet instead
+of falling back to Spark: the **JVM codegen dispatcher**.
+
+Comet has always fallen back to Spark row-based execution whenever an 
expression had no native Rust implementation, or where the
+Rust implementation could diverge from Spark on edge cases. A fallback is 
correct, a columnar-to-row
+conversion is needed to feed the data into Spark's row-based operators and 
this adds overhead when processing billions of rows of data.
+
+The codegen dispatcher avoids the fallback to row-based processing by running 
Spark's own
+generated code (`doGenCode`) inside the Comet pipeline, operating directly on 
Arrow batches. The result is a
+JVM-implemented Arrow-native expression: the data stays in Arrow format, and 
because the expression is
+evaluated by Spark's own code, the result is guaranteed to match Spark exactly 
across every supported Spark
+version. When the dispatcher is disabled, Comet falls back cleanly as before.

Review Comment:
   ```suggestion
   version. When the dispatcher is disabled, Comet falls back as before.
   ```



##########
content/blog/2026-06-20-datafusion-comet-0.17.0.md:
##########
@@ -0,0 +1,245 @@
+---
+layout: post
+title: Apache DataFusion Comet 0.17.0 Release
+date: 2026-06-20
+author: pmc
+categories: [subprojects]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+The Apache DataFusion PMC is pleased to announce version 0.17.0 of the 
[Comet](https://datafusion.apache.org/comet/) subproject.
+
+This release covers approximately five weeks of development work and is the 
result of merging 192 PRs from 19
+contributors. See the [change log] for more information.
+
+[change log]: 
https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.17.0.md
+
+## Fewer Fallbacks to Spark
+
+The headline feature of 0.17.0 is a new mechanism that keeps more of your 
query running inside Comet instead
+of falling back to Spark: the **JVM codegen dispatcher**.
+
+Comet has always fallen back to Spark row-based execution whenever an 
expression had no native Rust implementation, or where the
+Rust implementation could diverge from Spark on edge cases. A fallback is 
correct, a columnar-to-row
+conversion is needed to feed the data into Spark's row-based operators and 
this adds overhead when processing billions of rows of data.

Review Comment:
   ```suggestion
   Rust implementation could diverge from Spark on edge cases. A fallback is 
correct, but a columnar-to-row
   conversion is needed to feed the data into Spark's row-based operators, 
which adds overhead when processing billions of rows of data.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] blog: Apache DataFusion Comet 0.17.0 release post [datafusion-site]

Reply via email to