This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-staging in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-staging by this push: new 0e6b559 Commit build products 0e6b559 is described below commit 0e6b55926f1cca7131b389713bb99e61f91c1fb6 Author: Build Pelican (action) <priv...@infra.apache.org> AuthorDate: Tue Jul 1 13:20:35 2025 +0000 Commit build products --- .../06 => 07/01}/datafusion-comet-0.9.0/index.html | 6 +- blog/author/pmc.html | 18 +- blog/category/blog.html | 80 ++++---- blog/feed.xml | 46 ++--- blog/feeds/all-en.atom.xml | 214 ++++++++++----------- blog/feeds/blog.atom.xml | 214 ++++++++++----------- blog/feeds/pmc.atom.xml | 172 ++++++++--------- blog/feeds/pmc.rss.xml | 10 +- blog/index.html | 80 ++++---- 9 files changed, 420 insertions(+), 420 deletions(-) diff --git a/blog/2025/05/06/datafusion-comet-0.9.0/index.html b/blog/2025/07/01/datafusion-comet-0.9.0/index.html similarity index 97% rename from blog/2025/05/06/datafusion-comet-0.9.0/index.html rename to blog/2025/07/01/datafusion-comet-0.9.0/index.html index b25882c..a1738e1 100644 --- a/blog/2025/05/06/datafusion-comet-0.9.0/index.html +++ b/blog/2025/07/01/datafusion-comet-0.9.0/index.html @@ -4,7 +4,7 @@ <meta charset="utf-8"> <meta http-equiv="x-ua-compatible" content="ie=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> - <title>Apache DataFusion Comet 0.8.0 Release - Apache DataFusion Blog</title> + <title>Apache DataFusion Comet 0.9.0 Release - Apache DataFusion Blog</title> <link href="/blog/css/bootstrap.min.css" rel="stylesheet"> <link href="/blog/css/fontawesome.all.min.css" rel="stylesheet"> <link href="/blog/css/headerlink.css" rel="stylesheet"> @@ -40,9 +40,9 @@ <div class="bg-white p-5 rounded"> <div class="col-sm-8 mx-auto"> <h1> - Apache DataFusion Comet 0.8.0 Release + Apache DataFusion Comet 0.9.0 Release </h1> - <p>Posted on: Tue 06 May 2025 by pmc</p> + <p>Posted on: Tue 01 July 2025 by pmc</p> <!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more diff --git a/blog/author/pmc.html b/blog/author/pmc.html index 132e8f7..baf3c34 100644 --- a/blog/author/pmc.html +++ b/blog/author/pmc.html @@ -53,8 +53,8 @@ <article class="post"> <header> <div class="title"> - <h1><a href="/blog/2025/05/06/datafusion-comet-0.8.0">Apache DataFusion Comet 0.8.0 Release</a></h1> - <p>Posted on: Tue 06 May 2025 by pmc</p> + <h1><a href="/blog/2025/07/01/datafusion-comet-0.9.0">Apache DataFusion Comet 0.9.0 Release</a></h1> + <p>Posted on: Tue 01 July 2025 by pmc</p> <p><!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more @@ -73,13 +73,13 @@ See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> -<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately six weeks of development …</p></p> +<p>This release covers approximately ten weeks of development …</p></p> <footer> <ul class="actions"> - <div style="text-align: right"><a href="/blog/2025/05/06/datafusion-comet-0.8.0" class="button medium">Continue Reading</a></div> + <div style="text-align: right"><a href="/blog/2025/07/01/datafusion-comet-0.9.0" class="button medium">Continue Reading</a></div> </ul> <ul class="stats"> </ul> @@ -93,7 +93,7 @@ improved performance and efficiency without requiring any code changes.</p> <article class="post"> <header> <div class="title"> - <h1><a href="/blog/2025/05/06/datafusion-comet-0.9.0">Apache DataFusion Comet 0.8.0 Release</a></h1> + <h1><a href="/blog/2025/05/06/datafusion-comet-0.8.0">Apache DataFusion Comet 0.8.0 Release</a></h1> <p>Posted on: Tue 06 May 2025 by pmc</p> <p><!-- {% comment %} @@ -113,13 +113,13 @@ See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> -<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately ten weeks of development …</p></p> +<p>This release covers approximately six weeks of development …</p></p> <footer> <ul class="actions"> - <div style="text-align: right"><a href="/blog/2025/05/06/datafusion-comet-0.9.0" class="button medium">Continue Reading</a></div> + <div style="text-align: right"><a href="/blog/2025/05/06/datafusion-comet-0.8.0" class="button medium">Continue Reading</a></div> </ul> <ul class="stats"> </ul> diff --git a/blog/category/blog.html b/blog/category/blog.html index 7d09b29..279af8b 100644 --- a/blog/category/blog.html +++ b/blog/category/blog.html @@ -47,6 +47,46 @@ <p><i>Here you can find the latest updates from DataFusion and related projects.</i></p> + <!-- Post --> + <div class="row"> + <div class="callout"> + <article class="post"> + <header> + <div class="title"> + <h1><a href="/blog/2025/07/01/datafusion-comet-0.9.0">Apache DataFusion Comet 0.9.0 Release</a></h1> + <p>Posted on: Tue 01 July 2025 by pmc</p> + <p><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>This release covers approximately ten weeks of development …</p></p> + <footer> + <ul class="actions"> + <div style="text-align: right"><a href="/blog/2025/07/01/datafusion-comet-0.9.0" class="button medium">Continue Reading</a></div> + </ul> + <ul class="stats"> + </ul> + </footer> + </article> + </div> + </div> <!-- Post --> <div class="row"> <div class="callout"> @@ -232,46 +272,6 @@ improved performance and efficiency without requiring any code changes.</p> </div> </div> <!-- Post --> - <div class="row"> - <div class="callout"> - <article class="post"> - <header> - <div class="title"> - <h1><a href="/blog/2025/05/06/datafusion-comet-0.9.0">Apache DataFusion Comet 0.8.0 Release</a></h1> - <p>Posted on: Tue 06 May 2025 by pmc</p> - <p><!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> -<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> -<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for -improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately ten weeks of development …</p></p> - <footer> - <ul class="actions"> - <div style="text-align: right"><a href="/blog/2025/05/06/datafusion-comet-0.9.0" class="button medium">Continue Reading</a></div> - </ul> - <ul class="stats"> - </ul> - </footer> - </article> - </div> - </div> - <!-- Post --> <div class="row"> <div class="callout"> <article class="post"> diff --git a/blog/feed.xml b/blog/feed.xml index 9d3d88c..72bdc22 100644 --- a/blog/feed.xml +++ b/blog/feed.xml @@ -1,5 +1,26 @@ <?xml version="1.0" encoding="utf-8"?> -<rss version="2.0"><channel><title>Apache DataFusion Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon, 30 Jun 2025 00:00:00 +0000</lastBuildDate><item><title>Using Rust async for Query Execution and Cancelling Long-Running Queries</title><link>https://datafusion.apache.org/blog/2025/06/30/cancellation</link><description><!-- +<rss version="2.0"><channel><title>Apache DataFusion Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Tue, 01 Jul 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet 0.9.0 Release</title><link>https://datafusion.apache.org/blog/2025/07/01/datafusion-comet-0.9.0</link><description><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>This release covers approximately ten weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 01 Jul 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-07-01:/blog/2025/07/01/datafusion-comet-0.9.0</guid><category>blog</category></item><item><title>Using Rust async for Query Execution and Cancelling Long-Running Queries</title><link>https://datafusion.apache.org/blog/20 [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -107,28 +128,7 @@ limitations under the License. <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately six weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>Apache DataFusion Comet 0.8.0 Release</title><link>https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.9.0</li [...] -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> -<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> -<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for -improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately ten weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.9.0</guid><category>blog</category></item><item><title>User defined Window Functions in DataFusion</title><link>https://datafusion.apache.org/blog/2025/04/19/user-defined-window- [...] +<p>This release covers approximately six weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>User defined Window Functions in DataFusion</title><link>https://datafusion.apache.org/blog/2025/04/19/user-defined-window- [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml index 8ef0c37..86ff6ef 100644 --- a/blog/feeds/all-en.atom.xml +++ b/blog/feeds/all-en.atom.xml @@ -1,5 +1,111 @@ <?xml version="1.0" encoding="utf-8"?> -<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-06-30T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Using Rust async for Query Execution and Cancelling Long-Running Queries</title><link href="https://datafusion.apache.org/blog/202 [...] +<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-07-01T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Comet 0.9.0 Release</title><link href="https://datafusion.apache.org/blog/2025/07/01/datafusion-comet-0.9.0" rel [...] +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>This release covers approximately ten weeks of development …</p></summary><content type="html"><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>This release covers approximately ten weeks of development work and is the result of merging 139 PRs from 24 +contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.9.0.md">change log</a> for more information.</p> +<h2>Release Highlights</h2> +<h3>Complex Type Support in Parquet Scans</h3> +<p>Comet now supports complex types (Structs, Maps, and Arrays) when reading Parquet files. This functionality is not +yet available when reading Parquet files from Apache Iceberg.</p> +<p>This functionality was only available in previous releases when manually specifying one of the new experimental +scan implementations. Comet now automatically chooses the best scan implementation based on the input schema, and no +longer requires manual configuration.</p> +<h3>Complex Type Processing Improvements</h3> +<p>Numerous improvements have been made to complex type support to ensure Spark-compatible behavior when casting between +structs and accessing fields within deeply nested types.</p> +<h3>Shuffle Improvements</h3> +<p>Comet now accelerates a broader range of shuffle operations, leading to more queries running fully natively. In +previous releases, some shuffle operations fell back to Spark to avoid some known bugs in Comet, and these bugs have +now been fixed.</p> +<h3>New Features</h3> +<p>Comet 0.9.0 adds support for the following Spark expressions:</p> +<ul> +<li>ArrayDistinct</li> +<li>ArrayMax</li> +<li>ArrayRepeat</li> +<li>ArrayUnion</li> +<li>BitCount</li> +<li>BitNot</li> +<li>Expm1</li> +<li>MapValues</li> +<li>Signum</li> +<li>ToPrettyString</li> +<li>map[]</li> +</ul> +<h3>Improved Spark SQL Test Coverage</h3> +<p>Comet now passes 97% of the Spark SQL test suite, with more than 24,000 tests passing (based on testing against +Spark 3.5.6).</p> +<p>This release contains numerous bug fixes to achieve this coverage, including improved support for exchange reuse +when AQE is enabled. The remaining ignored tests are mostly related to metric differences or tests irrelevant to +Comet, such as tests for whole-stage code generation.</p> +<p>| Module | Passed | Ignored | Canceled | Total | +| -------- | ------ | ------- | -------- | ------ | +| catalyst | 7,232 | 5 | 1 | 7,238 | +| core-1 | 9,186 | 246 | 6 | 9,438 | +| core-2 | 2,649 | 393 | 0 | 3,042 | +| core-3 | 1,757 | 136 | 16 | 1,909 | +| hive-1 | 2,174 | 14 | 4 | 2,192 | +| hive-2 | 19 | 1 | 4 | 24 | +| hive-3 | 1,058 | 11 | 4 | 1,073 | +| Total | 24,075 | 806 | 31 | 24,912 |</p> +<h3>Memory &amp; Performance Tracing</h3> +<p>Comet now provides a tracing feature for analyzing performance and off-heap versus on-heap memory usage. See the +<a href="https://datafusion.apache.org/comet/contributor-guide/tracing.html">Comet Tracing Guide</a> for more information.</p> +<p><img alt="Comet Tracing" class="img-responsive" src="/blog/images/comet-0.9.0/tracing.png" width="100%"/></p> +<h3>Spark Compatibility</h3> +<ul> +<li>Spark 3.4.3 with JDK 11 &amp; 17, Scala 2.12 &amp; 2.13</li> +<li>Spark 3.5.4 through 3.5.6 with JDK 11 &amp; 17, Scala 2.12 &amp; 2.13</li> +<li>Experimental support for Spark 4.0.0 with JDK 17, Scala 2.13</li> +</ul> +<p>Note that Java 8 support was removed from this release because Apache Arrow no longer supports it.</p> +<h2>Getting Involved</h2> +<p>The Comet project welcomes new contributors. We use the same <a href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack and Discord</a> channels as the main DataFusion +project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion video call</a>.</p> +<p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or +performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing +Comet.</p> +<p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>Using Rust async for Query Execution and Cancelling Long-Running Queries</title><link href="https://datafusion.apache.org/blog/2025/06/30/cancellation" rel="alternate"></link><published>2025-06-30T00:00:00+00:00</published><updated>2025-06-30T00:00:00+00:00</updat [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -1086,112 +1192,6 @@ project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIA <p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing Comet.</p> -<p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.8.0 Release</title><link href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.9.0" rel="alternate"></link><published>2025-05-06T00:00:00+00:00</published><updated>2025-05-06T00:00:00+00:00</updated><author><name>pmc</nam [...] -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> -<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> -<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for -improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately ten weeks of development …</p></summary><content type="html"><!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> -<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> -<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for -improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately ten weeks of development work and is the result of merging 139 PRs from 24 -contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.9.0.md">change log</a> for more information.</p> -<h2>Release Highlights</h2> -<h3>Complex Type Support in Parquet Scans</h3> -<p>Comet now supports complex types (Structs, Maps, and Arrays) when reading Parquet files. This functionality is not -yet available when reading Parquet files from Apache Iceberg.</p> -<p>This functionality was only available in previous releases when manually specifying one of the new experimental -scan implementations. Comet now automatically chooses the best scan implementation based on the input schema, and no -longer requires manual configuration.</p> -<h3>Complex Type Processing Improvements</h3> -<p>Numerous improvements have been made to complex type support to ensure Spark-compatible behavior when casting between -structs and accessing fields within deeply nested types.</p> -<h3>Shuffle Improvements</h3> -<p>Comet now accelerates a broader range of shuffle operations, leading to more queries running fully natively. In -previous releases, some shuffle operations fell back to Spark to avoid some known bugs in Comet, and these bugs have -now been fixed.</p> -<h3>New Features</h3> -<p>Comet 0.9.0 adds support for the following Spark expressions:</p> -<ul> -<li>ArrayDistinct</li> -<li>ArrayMax</li> -<li>ArrayRepeat</li> -<li>ArrayUnion</li> -<li>BitCount</li> -<li>BitNot</li> -<li>Expm1</li> -<li>MapValues</li> -<li>Signum</li> -<li>ToPrettyString</li> -<li>map[]</li> -</ul> -<h3>Improved Spark SQL Test Coverage</h3> -<p>Comet now passes 97% of the Spark SQL test suite, with more than 24,000 tests passing (based on testing against -Spark 3.5.6).</p> -<p>This release contains numerous bug fixes to achieve this coverage, including improved support for exchange reuse -when AQE is enabled. The remaining ignored tests are mostly related to metric differences or tests irrelevant to -Comet, such as tests for whole-stage code generation.</p> -<p>| Module | Passed | Ignored | Canceled | Total | -| -------- | ------ | ------- | -------- | ------ | -| catalyst | 7,232 | 5 | 1 | 7,238 | -| core-1 | 9,186 | 246 | 6 | 9,438 | -| core-2 | 2,649 | 393 | 0 | 3,042 | -| core-3 | 1,757 | 136 | 16 | 1,909 | -| hive-1 | 2,174 | 14 | 4 | 2,192 | -| hive-2 | 19 | 1 | 4 | 24 | -| hive-3 | 1,058 | 11 | 4 | 1,073 | -| Total | 24,075 | 806 | 31 | 24,912 |</p> -<h3>Memory &amp; Performance Tracing</h3> -<p>Comet now provides a tracing feature for analyzing performance and off-heap versus on-heap memory usage. See the -<a href="https://datafusion.apache.org/comet/contributor-guide/tracing.html">Comet Tracing Guide</a> for more information.</p> -<p><img alt="Comet Tracing" class="img-responsive" src="/blog/images/comet-0.9.0/tracing.png" width="100%"/></p> -<h3>Spark Compatibility</h3> -<ul> -<li>Spark 3.4.3 with JDK 11 &amp; 17, Scala 2.12 &amp; 2.13</li> -<li>Spark 3.5.4 through 3.5.6 with JDK 11 &amp; 17, Scala 2.12 &amp; 2.13</li> -<li>Experimental support for Spark 4.0.0 with JDK 17, Scala 2.13</li> -</ul> -<p>Note that Java 8 support was removed from this release because Apache Arrow no longer supports it.</p> -<h2>Getting Involved</h2> -<p>The Comet project welcomes new contributors. We use the same <a href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack and Discord</a> channels as the main DataFusion -project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion video call</a>.</p> -<p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or -performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing -Comet.</p> <p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>User defined Window Functions in DataFusion</title><link href="https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions" rel="alternate"></link><published>2025-04-19T00:00:00+00:00</published><updated>2025-04-19T00:00:00+00:00</updated><author>< [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml index f468759..b1d5c0a 100644 --- a/blog/feeds/blog.atom.xml +++ b/blog/feeds/blog.atom.xml @@ -1,5 +1,111 @@ <?xml version="1.0" encoding="utf-8"?> -<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/blog.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-06-30T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Using Rust async for Query Execution and Cancelling Long-Running Queries</title><link href="https://datafusion.apache.org/blo [...] +<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/blog.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-07-01T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Comet 0.9.0 Release</title><link href="https://datafusion.apache.org/blog/2025/07/01/datafusion-comet-0.9.0 [...] +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>This release covers approximately ten weeks of development …</p></summary><content type="html"><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>This release covers approximately ten weeks of development work and is the result of merging 139 PRs from 24 +contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.9.0.md">change log</a> for more information.</p> +<h2>Release Highlights</h2> +<h3>Complex Type Support in Parquet Scans</h3> +<p>Comet now supports complex types (Structs, Maps, and Arrays) when reading Parquet files. This functionality is not +yet available when reading Parquet files from Apache Iceberg.</p> +<p>This functionality was only available in previous releases when manually specifying one of the new experimental +scan implementations. Comet now automatically chooses the best scan implementation based on the input schema, and no +longer requires manual configuration.</p> +<h3>Complex Type Processing Improvements</h3> +<p>Numerous improvements have been made to complex type support to ensure Spark-compatible behavior when casting between +structs and accessing fields within deeply nested types.</p> +<h3>Shuffle Improvements</h3> +<p>Comet now accelerates a broader range of shuffle operations, leading to more queries running fully natively. In +previous releases, some shuffle operations fell back to Spark to avoid some known bugs in Comet, and these bugs have +now been fixed.</p> +<h3>New Features</h3> +<p>Comet 0.9.0 adds support for the following Spark expressions:</p> +<ul> +<li>ArrayDistinct</li> +<li>ArrayMax</li> +<li>ArrayRepeat</li> +<li>ArrayUnion</li> +<li>BitCount</li> +<li>BitNot</li> +<li>Expm1</li> +<li>MapValues</li> +<li>Signum</li> +<li>ToPrettyString</li> +<li>map[]</li> +</ul> +<h3>Improved Spark SQL Test Coverage</h3> +<p>Comet now passes 97% of the Spark SQL test suite, with more than 24,000 tests passing (based on testing against +Spark 3.5.6).</p> +<p>This release contains numerous bug fixes to achieve this coverage, including improved support for exchange reuse +when AQE is enabled. The remaining ignored tests are mostly related to metric differences or tests irrelevant to +Comet, such as tests for whole-stage code generation.</p> +<p>| Module | Passed | Ignored | Canceled | Total | +| -------- | ------ | ------- | -------- | ------ | +| catalyst | 7,232 | 5 | 1 | 7,238 | +| core-1 | 9,186 | 246 | 6 | 9,438 | +| core-2 | 2,649 | 393 | 0 | 3,042 | +| core-3 | 1,757 | 136 | 16 | 1,909 | +| hive-1 | 2,174 | 14 | 4 | 2,192 | +| hive-2 | 19 | 1 | 4 | 24 | +| hive-3 | 1,058 | 11 | 4 | 1,073 | +| Total | 24,075 | 806 | 31 | 24,912 |</p> +<h3>Memory &amp; Performance Tracing</h3> +<p>Comet now provides a tracing feature for analyzing performance and off-heap versus on-heap memory usage. See the +<a href="https://datafusion.apache.org/comet/contributor-guide/tracing.html">Comet Tracing Guide</a> for more information.</p> +<p><img alt="Comet Tracing" class="img-responsive" src="/blog/images/comet-0.9.0/tracing.png" width="100%"/></p> +<h3>Spark Compatibility</h3> +<ul> +<li>Spark 3.4.3 with JDK 11 &amp; 17, Scala 2.12 &amp; 2.13</li> +<li>Spark 3.5.4 through 3.5.6 with JDK 11 &amp; 17, Scala 2.12 &amp; 2.13</li> +<li>Experimental support for Spark 4.0.0 with JDK 17, Scala 2.13</li> +</ul> +<p>Note that Java 8 support was removed from this release because Apache Arrow no longer supports it.</p> +<h2>Getting Involved</h2> +<p>The Comet project welcomes new contributors. We use the same <a href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack and Discord</a> channels as the main DataFusion +project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion video call</a>.</p> +<p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or +performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing +Comet.</p> +<p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>Using Rust async for Query Execution and Cancelling Long-Running Queries</title><link href="https://datafusion.apache.org/blog/2025/06/30/cancellation" rel="alternate"></link><published>2025-06-30T00:00:00+00:00</published><updated>2025-06-30T00:00:00+00:00</updat [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -1086,112 +1192,6 @@ project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIA <p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing Comet.</p> -<p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.8.0 Release</title><link href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.9.0" rel="alternate"></link><published>2025-05-06T00:00:00+00:00</published><updated>2025-05-06T00:00:00+00:00</updated><author><name>pmc</nam [...] -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> -<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> -<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for -improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately ten weeks of development …</p></summary><content type="html"><!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> -<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> -<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for -improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately ten weeks of development work and is the result of merging 139 PRs from 24 -contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.9.0.md">change log</a> for more information.</p> -<h2>Release Highlights</h2> -<h3>Complex Type Support in Parquet Scans</h3> -<p>Comet now supports complex types (Structs, Maps, and Arrays) when reading Parquet files. This functionality is not -yet available when reading Parquet files from Apache Iceberg.</p> -<p>This functionality was only available in previous releases when manually specifying one of the new experimental -scan implementations. Comet now automatically chooses the best scan implementation based on the input schema, and no -longer requires manual configuration.</p> -<h3>Complex Type Processing Improvements</h3> -<p>Numerous improvements have been made to complex type support to ensure Spark-compatible behavior when casting between -structs and accessing fields within deeply nested types.</p> -<h3>Shuffle Improvements</h3> -<p>Comet now accelerates a broader range of shuffle operations, leading to more queries running fully natively. In -previous releases, some shuffle operations fell back to Spark to avoid some known bugs in Comet, and these bugs have -now been fixed.</p> -<h3>New Features</h3> -<p>Comet 0.9.0 adds support for the following Spark expressions:</p> -<ul> -<li>ArrayDistinct</li> -<li>ArrayMax</li> -<li>ArrayRepeat</li> -<li>ArrayUnion</li> -<li>BitCount</li> -<li>BitNot</li> -<li>Expm1</li> -<li>MapValues</li> -<li>Signum</li> -<li>ToPrettyString</li> -<li>map[]</li> -</ul> -<h3>Improved Spark SQL Test Coverage</h3> -<p>Comet now passes 97% of the Spark SQL test suite, with more than 24,000 tests passing (based on testing against -Spark 3.5.6).</p> -<p>This release contains numerous bug fixes to achieve this coverage, including improved support for exchange reuse -when AQE is enabled. The remaining ignored tests are mostly related to metric differences or tests irrelevant to -Comet, such as tests for whole-stage code generation.</p> -<p>| Module | Passed | Ignored | Canceled | Total | -| -------- | ------ | ------- | -------- | ------ | -| catalyst | 7,232 | 5 | 1 | 7,238 | -| core-1 | 9,186 | 246 | 6 | 9,438 | -| core-2 | 2,649 | 393 | 0 | 3,042 | -| core-3 | 1,757 | 136 | 16 | 1,909 | -| hive-1 | 2,174 | 14 | 4 | 2,192 | -| hive-2 | 19 | 1 | 4 | 24 | -| hive-3 | 1,058 | 11 | 4 | 1,073 | -| Total | 24,075 | 806 | 31 | 24,912 |</p> -<h3>Memory &amp; Performance Tracing</h3> -<p>Comet now provides a tracing feature for analyzing performance and off-heap versus on-heap memory usage. See the -<a href="https://datafusion.apache.org/comet/contributor-guide/tracing.html">Comet Tracing Guide</a> for more information.</p> -<p><img alt="Comet Tracing" class="img-responsive" src="/blog/images/comet-0.9.0/tracing.png" width="100%"/></p> -<h3>Spark Compatibility</h3> -<ul> -<li>Spark 3.4.3 with JDK 11 &amp; 17, Scala 2.12 &amp; 2.13</li> -<li>Spark 3.5.4 through 3.5.6 with JDK 11 &amp; 17, Scala 2.12 &amp; 2.13</li> -<li>Experimental support for Spark 4.0.0 with JDK 17, Scala 2.13</li> -</ul> -<p>Note that Java 8 support was removed from this release because Apache Arrow no longer supports it.</p> -<h2>Getting Involved</h2> -<p>The Comet project welcomes new contributors. We use the same <a href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack and Discord</a> channels as the main DataFusion -project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion video call</a>.</p> -<p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or -performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing -Comet.</p> <p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>User defined Window Functions in DataFusion</title><link href="https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions" rel="alternate"></link><published>2025-04-19T00:00:00+00:00</published><updated>2025-04-19T00:00:00+00:00</updated><author>< [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more diff --git a/blog/feeds/pmc.atom.xml b/blog/feeds/pmc.atom.xml index 8b6fc84..b7ed85a 100644 --- a/blog/feeds/pmc.atom.xml +++ b/blog/feeds/pmc.atom.xml @@ -1,90 +1,5 @@ <?xml version="1.0" encoding="utf-8"?> -<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - pmc</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-05-06T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Comet 0.8.0 Release</title><link href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0" [...] -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> -<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> -<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for -improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately six weeks of development …</p></summary><content type="html"><!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> -<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> -<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for -improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately six weeks of development work and is the result of merging 81 PRs from 11 -contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md">change log</a> for more information.</p> -<h2>Release Highlights</h2> -<h3>Performance &amp; Stability</h3> -<ul> -<li>Up to 4x speedup in jobs using <code>dropDuplicates</code>, thanks to optimizations in the <code>first_value</code> and <code>last_value</code> - aggregate functions in DataFusion 47.0.0.</li> -<li>Introduction of a global Tokio runtime, which resolves potential deadlocks in certain multi-task scenarios.</li> -</ul> -<h2>Native Shuffle Improvements</h2> -<p>Significant enhancements to the native shuffle mechanism include:</p> -<ul> -<li>Lower memory usage through using <code>interleave_record_batches</code> instead of using array builders.</li> -<li>Support for complex types in shuffle data (note: hash partition expressions still require primitive types).</li> -<li>Reclaimable shuffle files, reducing disk pressure.</li> -<li>Respects <code>spark.local.dir</code> for temporary storage.</li> -<li>Per-task shuffle metrics are now available, providing better visibility into execution behavior.</li> -</ul> -<h2>Experimental Support for DataFusion&rsquo;s Parquet Scan</h2> -<p>It is now possible to configure Comet to use DataFusion&rsquo;s Parquet reader instead of Comet&rsquo;s current Parquet reader. This -has the advantage of supporting complex types, and also has performance optimizations that are not present in Comet's -existing reader.</p> -<p>This release continues with the ongoing improvements and bug fixes and supports more use cases, but there are still -some known issues:</p> -<ul> -<li>There are schema coercion bugs for nested types containing INT96 columns, which can cause incorrect results.</li> -<li>There are compatibility issues when reading integer values that are larger than their type annotation, such as the - value 1024 being stored in a field annotated as int(8).</li> -<li>A small number of Spark SQL tests remain unsupported (<a href="https://github.com/apache/datafusion-comet/issues/1545">#1545</a>).</li> -</ul> -<p>To enable DataFusion&rsquo;s Parquet reader, either set <code>spark.comet.scan.impl=native_datafusion</code> or set the environment -variable <code>COMET_PARQUET_SCAN_IMPL=native_datafusion</code>.</p> -<h2>Updates to Supported Spark Versions</h2> -<ul> -<li>Added support for Spark 3.5.5</li> -<li>Dropped support for Spark 3.3.x</li> -</ul> -<h2>Getting Involved</h2> -<p>The Comet project welcomes new contributors. We use the same <a href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack and Discord</a> channels as the main DataFusion -project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion video call</a>.</p> -<p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or -performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing -Comet.</p> -<p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.8.0 Release</title><link href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.9.0" rel="alternate"></link><published>2025-05-06T00:00:00+00:00</published><updated>2025-05-06T00:00:00+00:00</updated><author><name>pmc</nam [...] +<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - pmc</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-07-01T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Comet 0.9.0 Release</title><link href="https://datafusion.apache.org/blog/2025/07/01/datafusion-comet-0.9.0" [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -190,6 +105,91 @@ project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIA <p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing Comet.</p> +<p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.8.0 Release</title><link href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0" rel="alternate"></link><published>2025-05-06T00:00:00+00:00</published><updated>2025-05-06T00:00:00+00:00</updated><author><name>pmc</nam [...] +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>This release covers approximately six weeks of development …</p></summary><content type="html"><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>This release covers approximately six weeks of development work and is the result of merging 81 PRs from 11 +contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md">change log</a> for more information.</p> +<h2>Release Highlights</h2> +<h3>Performance &amp; Stability</h3> +<ul> +<li>Up to 4x speedup in jobs using <code>dropDuplicates</code>, thanks to optimizations in the <code>first_value</code> and <code>last_value</code> + aggregate functions in DataFusion 47.0.0.</li> +<li>Introduction of a global Tokio runtime, which resolves potential deadlocks in certain multi-task scenarios.</li> +</ul> +<h2>Native Shuffle Improvements</h2> +<p>Significant enhancements to the native shuffle mechanism include:</p> +<ul> +<li>Lower memory usage through using <code>interleave_record_batches</code> instead of using array builders.</li> +<li>Support for complex types in shuffle data (note: hash partition expressions still require primitive types).</li> +<li>Reclaimable shuffle files, reducing disk pressure.</li> +<li>Respects <code>spark.local.dir</code> for temporary storage.</li> +<li>Per-task shuffle metrics are now available, providing better visibility into execution behavior.</li> +</ul> +<h2>Experimental Support for DataFusion&rsquo;s Parquet Scan</h2> +<p>It is now possible to configure Comet to use DataFusion&rsquo;s Parquet reader instead of Comet&rsquo;s current Parquet reader. This +has the advantage of supporting complex types, and also has performance optimizations that are not present in Comet's +existing reader.</p> +<p>This release continues with the ongoing improvements and bug fixes and supports more use cases, but there are still +some known issues:</p> +<ul> +<li>There are schema coercion bugs for nested types containing INT96 columns, which can cause incorrect results.</li> +<li>There are compatibility issues when reading integer values that are larger than their type annotation, such as the + value 1024 being stored in a field annotated as int(8).</li> +<li>A small number of Spark SQL tests remain unsupported (<a href="https://github.com/apache/datafusion-comet/issues/1545">#1545</a>).</li> +</ul> +<p>To enable DataFusion&rsquo;s Parquet reader, either set <code>spark.comet.scan.impl=native_datafusion</code> or set the environment +variable <code>COMET_PARQUET_SCAN_IMPL=native_datafusion</code>.</p> +<h2>Updates to Supported Spark Versions</h2> +<ul> +<li>Added support for Spark 3.5.5</li> +<li>Dropped support for Spark 3.3.x</li> +</ul> +<h2>Getting Involved</h2> +<p>The Comet project welcomes new contributors. We use the same <a href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack and Discord</a> channels as the main DataFusion +project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion video call</a>.</p> +<p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or +performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing +Comet.</p> <p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.7.0 Release</title><link href="https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0" rel="alternate"></link><published>2025-03-20T00:00:00+00:00</published><updated>2025-03-20T00:00:00+00:00</updated><author><name>pmc</nam [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more diff --git a/blog/feeds/pmc.rss.xml b/blog/feeds/pmc.rss.xml index 7be1820..53713b7 100644 --- a/blog/feeds/pmc.rss.xml +++ b/blog/feeds/pmc.rss.xml @@ -1,5 +1,5 @@ <?xml version="1.0" encoding="utf-8"?> -<rss version="2.0"><channel><title>Apache DataFusion Blog - pmc</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Tue, 06 May 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet 0.8.0 Release</title><link>https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0</link><description><!-- +<rss version="2.0"><channel><title>Apache DataFusion Blog - pmc</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Tue, 01 Jul 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet 0.9.0 Release</title><link>https://datafusion.apache.org/blog/2025/07/01/datafusion-comet-0.9.0</link><description><!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -17,10 +17,10 @@ See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> -<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately six weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>Apache DataFusion Comet 0.8.0 Release</title><link>https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.9.0</li [...] +<p>This release covers approximately ten weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 01 Jul 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-07-01:/blog/2025/07/01/datafusion-comet-0.9.0</guid><category>blog</category></item><item><title>Apache DataFusion Comet 0.8.0 Release</title><link>https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0</li [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -38,10 +38,10 @@ See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> -<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately ten weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.9.0</guid><category>blog</category></item><item><title>Apache DataFusion Comet 0.7.0 Release</title><link>https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0</li [...] +<p>This release covers approximately six weeks of development …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06 May 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>Apache DataFusion Comet 0.7.0 Release</title><link>https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0</li [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with diff --git a/blog/index.html b/blog/index.html index 0f4cb91..83908d4 100644 --- a/blog/index.html +++ b/blog/index.html @@ -44,6 +44,46 @@ <p><i>Here you can find the latest updates from DataFusion and related projects.</i></p> + <!-- Post --> + <div class="row"> + <div class="callout"> + <article class="post"> + <header> + <div class="title"> + <h1><a href="/blog/2025/07/01/datafusion-comet-0.9.0">Apache DataFusion Comet 0.9.0 Release</a></h1> + <p>Posted on: Tue 01 July 2025 by pmc</p> + <p><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>This release covers approximately ten weeks of development …</p></p> + <footer> + <ul class="actions"> + <div style="text-align: right"><a href="/blog/2025/07/01/datafusion-comet-0.9.0" class="button medium">Continue Reading</a></div> + </ul> + <ul class="stats"> + </ul> + </footer> + </article> + </div> + </div> <!-- Post --> <div class="row"> <div class="callout"> @@ -229,46 +269,6 @@ improved performance and efficiency without requiring any code changes.</p> </div> </div> <!-- Post --> - <div class="row"> - <div class="callout"> - <article class="post"> - <header> - <div class="title"> - <h1><a href="/blog/2025/05/06/datafusion-comet-0.9.0">Apache DataFusion Comet 0.8.0 Release</a></h1> - <p>Posted on: Tue 06 May 2025 by pmc</p> - <p><!-- -{% comment %} -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to you under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - -http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -{% endcomment %} ---> -<p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> -<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for -improved performance and efficiency without requiring any code changes.</p> -<p>This release covers approximately ten weeks of development …</p></p> - <footer> - <ul class="actions"> - <div style="text-align: right"><a href="/blog/2025/05/06/datafusion-comet-0.9.0" class="button medium">Continue Reading</a></div> - </ul> - <ul class="stats"> - </ul> - </footer> - </article> - </div> - </div> - <!-- Post --> <div class="row"> <div class="callout"> <article class="post"> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org