This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-staging by this push:
new 62ce4c5 Commit build products
62ce4c5 is described below
commit 62ce4c5fbcd0da207f16a0396369aab32b92358d
Author: Build Pelican (action) <[email protected]>
AuthorDate: Mon Feb 17 20:11:07 2025 +0000
Commit build products
---
.../02/02/datafusion-ballista-43.0.0/index.html | 130 ++++++++++++++
.../2025/02/07/datafusion-python-44.0.0/index.html | 138 ---------------
blog/2025/02/17/datafusion-comet-0.6.0/index.html | 115 ++++++++++++
blog/author/milenkovicm.html | 109 ++++++++++++
blog/author/pmc.html | 40 +++++
blog/author/timsaucer.html | 41 -----
blog/category/blog.html | 55 +++++-
blog/feed.xml | 32 +++-
blog/feeds/all-en.atom.xml | 192 ++++++++++++++-------
blog/feeds/blog.atom.xml | 192 ++++++++++++++-------
blog/feeds/milenkovicm.atom.xml | 92 ++++++++++
blog/feeds/milenkovicm.rss.xml | 23 +++
blog/feeds/pmc.atom.xml | 77 ++++++++-
blog/feeds/pmc.rss.xml | 23 ++-
blog/feeds/timsaucer.atom.xml | 101 +----------
blog/feeds/timsaucer.rss.xml | 24 +--
.../datafusion-ballista-43.0.0/ballista-logo.png | Bin 0 -> 65501 bytes
.../datafusion-ballista-43.0.0/tpch_allqueries.png | Bin 0 -> 27455 bytes
.../tpch_queries_compare.png | Bin 0 -> 32843 bytes
.../tpch_queries_speedup_rel.png | Bin 0 -> 47169 bytes
blog/index.html | 55 +++++-
21 files changed, 987 insertions(+), 452 deletions(-)
diff --git a/blog/2025/02/02/datafusion-ballista-43.0.0/index.html
b/blog/2025/02/02/datafusion-ballista-43.0.0/index.html
new file mode 100644
index 0000000..25e3a54
--- /dev/null
+++ b/blog/2025/02/02/datafusion-ballista-43.0.0/index.html
@@ -0,0 +1,130 @@
+<!doctype html>
+<html class="no-js" lang="en" dir="ltr">
+ <head>
+ <meta charset="utf-8">
+ <meta http-equiv="x-ua-compatible" content="ie=edge">
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
+ <title>Apache DataFusion Ballista 43.0.0 Released - Apache DataFusion
Blog</title>
+<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
+<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
+<link href="/blog/css/headerlink.css" rel="stylesheet">
+<link href="/blog/highlight/default.min.css" rel="stylesheet">
+<script src="/blog/highlight/highlight.js"></script>
+<script>hljs.highlightAll();</script> </head>
+ <body class="d-flex flex-column h-100">
+ <main class="flex-shrink-0">
+<!-- nav bar -->
+<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth
navbar example">
+ <div class="container-fluid">
+ <a class="navbar-brand" href="/blog"><img
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache
DataFusion Blog</a>
+ <button class="navbar-toggler" type="button" data-bs-toggle="collapse"
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false"
aria-label="Toggle navigation">
+ <span class="navbar-toggler-icon"></span>
+ </button>
+
+ <div class="collapse navbar-collapse" id="navbarADP">
+ <ul class="navbar-nav me-auto mb-2 mb-lg-0">
+ <li class="nav-item">
+ <a class="nav-link" href="/blog/about.html">About</a>
+ </li>
+ <li class="nav-item">
+ <a class="nav-link" href="/blog/feed.xml">RSS</a>
+ </li>
+ </ul>
+ </div>
+ </div>
+</nav>
+
+
+<!-- page contents -->
+<div id="contents">
+ <div class="bg-white p-5 rounded">
+ <div class="col-sm-8 mx-auto">
+ <h1>
+ Apache DataFusion Ballista 43.0.0 Released
+ </h1>
+ <p>Posted on: Sun 02 February 2025 by milenkovicm</p>
+ <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be scaled
out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of this release has been to achieve a more seamless
integration with the DataFusion ecosystem and try to achieve the same level of
flexibility as DataFusion.</p>
+<p>In recent months, our development efforts have been directed toward
providing a robust and extensible Ballista API. This new API empowers end-users
to tailor Ballista's core functionality to their specific use cases. As a
result, we have deprecated several experimental features from the Ballista
core, allowing users to reintroduce them as custom extensions outside the core
framework. This shift reduces the maintenance burden on Ballista's core
maintainers and paves the way for optiona [...]
+<p>The most significant enhancement in this release is the deprecation of
<code>BallistaContext</code>, which has been superseded by the DataFusion
<code>SessionContext</code>. This change enables DataFusion applications
written in Rust to execute on a Ballista cluster with minimal modifications.
Beyond simplifying migration and reducing maintenance overhead, this update
introduces distributed write functionality to Ballista for the first time,
significantly enhancing its capabilities.</p>
+<div class="codehilite"><pre><span></span><code><span
class="k">use</span><span class="w"> </span><span
class="n">ballista</span>::<span class="n">prelude</span>::<span
class="o">*</span><span class="p">;</span><span class="w"></span>
+<span class="k">use</span><span class="w"> </span><span
class="n">datafusion</span>::<span class="n">prelude</span>::<span
class="o">*</span><span class="p">;</span><span class="w"></span>
+
+<span class="cp">#[tokio::main]</span><span class="w"></span>
+<span class="k">async</span><span class="w"> </span><span class="k">fn</span>
<span class="nf">main</span><span class="p">()</span><span class="w">
</span>-> <span class="nc">datafusion</span>::<span
class="n">error</span>::<span class="nb">Result</span><span
class="o"><</span><span class="p">()</span><span class="o">></span><span
class="w"> </span><span class="p">{</span><span class="w"></span>
+
+<span class="w"> </span><span class="c1">// Instead of creating classic
SessionContext</span>
+<span class="w"> </span><span class="c1">// let ctx =
SessionContext::new();</span>
+
+<span class="w"> </span><span class="c1">// create DataFusion SessionContext
with ballista standalone cluster started</span>
+<span class="w"> </span><span class="c1">// let ctx =
SessionContext::standalone().await;</span>
+
+<span class="w"> </span><span class="c1">// create DataFusion SessionContext
with ballista remote cluster started</span>
+<span class="w"> </span><span class="kd">let</span><span class="w">
</span><span class="n">ctx</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">SessionContext</span>::<span class="n">remote</span><span
class="p">(</span><span class="s">"df://localhost:50050"</span><span
class="p">).</span><span class="k">await</span><span class="p">;</span><span
class="w"></span>
+
+<span class="w"> </span><span class="c1">// register the table</span>
+<span class="w"> </span><span class="n">ctx</span><span
class="p">.</span><span class="n">register_csv</span><span
class="p">(</span><span class="s">"example"</span><span class="p">,</span><span
class="w"> </span><span class="s">"tests/data/example.csv"</span><span
class="p">,</span><span class="w"> </span><span
class="n">CsvReadOptions</span>::<span class="n">new</span><span
class="p">()).</span><span class="k">await</span><span class="o">?</span><span
class="p">;</span><span class="w" [...]
+
+<span class="w"> </span><span class="c1">// create a plan to run a SQL
query</span>
+<span class="w"> </span><span class="kd">let</span><span class="w">
</span><span class="n">df</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span class="n">ctx</span><span
class="p">.</span><span class="n">sql</span><span class="p">(</span><span
class="s">"SELECT a, MIN(b) FROM example WHERE a <= b GROUP BY a LIMIT
100"</span><span class="p">).</span><span class="k">await</span><span
class="o">?</span><span class="p">;</span><span class="w"></span>
+
+<span class="w"> </span><span class="c1">// execute and print results</span>
+<span class="w"> </span><span class="n">df</span><span
class="p">.</span><span class="n">show</span><span class="p">().</span><span
class="k">await</span><span class="o">?</span><span class="p">;</span><span
class="w"></span>
+<span class="w"> </span><span class="nb">Ok</span><span
class="p">(())</span><span class="w"></span>
+<span class="p">}</span><span class="w"></span>
+</code></pre></div>
+<p>Additionally, Ballista’s versioning scheme has been aligned with that
of DataFusion, ensuring that Ballista's version number reflects the compatible
DataFusion version.</p>
+<p>At the moment there is a gap between DataFusion and Ballista, which we will
try to bridge in the future.</p>
+<h3>Removal of Experimental Features</h3>
+<p>Ballista had grown in scope to include several experimental features in
various states of completeness. Some features have been removed from this
release in an effort to strip Ballista back to its core and make it easier to
maintain and extend.</p>
+<p>Specifically, the caching subsystem, predefined object store registry,
plugin subsystem, key-value stores for persistent scheduler state, and the UI
have been removed.</p>
+<h3>Performance & Scalability</h3>
+<p>Ballista has significantly leveraged the advancements made in the
DataFusion project over the past year. Benchmark results demonstrate notable
improvements in performance, highlighting the impact of these enhancements:</p>
+<p>Per query comparison:</p>
+<p><img alt="Per query comparison" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_compare.png"
width="100%"/></p>
+<p>Relative speedup:</p>
+<p><img alt="Relative speedup graph" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_speedup_rel.png"
width="100%"/></p>
+<p>The overall speedup is 2.9x</p>
+<p><img alt="Overall speedup" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_allqueries.png"
width="50%"/></p>
+<h3>New Logo</h3>
+<p>Ballista now has a new logo, which is visually similar to other DataFusion
projects. </p>
+<p><img alt="New logo" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/ballista-logo.png"
width="50%"/></p>
+<h2>Roadmap</h2>
+<p>Moving forward, Ballista will adopt the same release cadence as DataFusion,
providing synchronized updates across the ecosystem.
+Currently, there is no established long-term roadmap for Ballista. A plan will
be formulated in the coming months based on community feedback and the
availability of additional maintainers.</p>
+<p>In the short term, development efforts will concentrate on closing the
feature gap between DataFusion and Ballista. Key priorities include
implementing support for <code>INSERT INTO</code>, enabling table
<code>URL</code> functionality, and achieving deeper integration with the
Python ecosystem.</p>
+ </div>
+ </div>
+ </div>
+ <!-- footer -->
+ <div class="row">
+ <div class="large-12 medium-12 columns">
+ <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
+ Copyright 2025, <a href="https://www.apache.org/">The Apache
Software Foundation</a>, Licensed under the <a
href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version
2.0</a>.<br/>
+ Apache® and the Apache feather logo are trademarks of The Apache
Software Foundation.
+ </p>
+ </div>
+ </div>
+ <script src="/blog/js/bootstrap.bundle.min.js"></script> </main>
+ </body>
+</html>
diff --git a/blog/2025/02/07/datafusion-python-44.0.0/index.html
b/blog/2025/02/07/datafusion-python-44.0.0/index.html
deleted file mode 100644
index 7fe3799..0000000
--- a/blog/2025/02/07/datafusion-python-44.0.0/index.html
+++ /dev/null
@@ -1,138 +0,0 @@
-<!doctype html>
-<html class="no-js" lang="en" dir="ltr">
- <head>
- <meta charset="utf-8">
- <meta http-equiv="x-ua-compatible" content="ie=edge">
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
- <title>Apache DataFusion Python 44.0.0 Released - Apache DataFusion
Blog</title>
-<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
-<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
-<link href="/blog/css/headerlink.css" rel="stylesheet">
-<link href="/blog/highlight/default.min.css" rel="stylesheet">
-<script src="/blog/highlight/highlight.js"></script>
-<script>hljs.highlightAll();</script> </head>
- <body class="d-flex flex-column h-100">
- <main class="flex-shrink-0">
-<!-- nav bar -->
-<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth
navbar example">
- <div class="container-fluid">
- <a class="navbar-brand" href="/blog"><img
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache
DataFusion Blog</a>
- <button class="navbar-toggler" type="button" data-bs-toggle="collapse"
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false"
aria-label="Toggle navigation">
- <span class="navbar-toggler-icon"></span>
- </button>
-
- <div class="collapse navbar-collapse" id="navbarADP">
- <ul class="navbar-nav me-auto mb-2 mb-lg-0">
- <li class="nav-item">
- <a class="nav-link" href="/blog/about.html">About</a>
- </li>
- <li class="nav-item">
- <a class="nav-link" href="/blog/feed.xml">RSS</a>
- </li>
- </ul>
- </div>
- </div>
-</nav>
-
-
-<!-- page contents -->
-<div id="contents">
- <div class="bg-white p-5 rounded">
- <div class="col-sm-8 mx-auto">
- <h1>
- Apache DataFusion Python 44.0.0 Released
- </h1>
- <p>Posted on: Fri 07 February 2025 by timsaucer</p>
- <!--
-{% comment %}
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to you under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-{% endcomment %}
--->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python 44.0.0</a>
has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch</code> from a <code>RecordBatchStream</code>
was a synchronous call, which would
-require the end user's code to wait for the data retrieval. This is described
in
-<a href="https://github.com/apache/datafusion-python/issues/974">Issue
974</a>. We continue to support this as a synchronous iterator, but we have
also added
-in the ability to retrieve the <code>RecordBatch</code> using the Python
asynchronous <code>anext</code>
-function.</p>
-<h1>Default Compression for Parquet files</h1>
-<p>With <a href="https://github.com/apache/datafusion-python/pull/981">PR
981</a>, we change the saving of Parquet files to use zstd compression by
default.
-Previously the default was uncompressed, causing excessive disk storage. Zstd
is an
-excellent compression scheme that balances speed and compression ratio. Users
can still
-save their Parquet files uncompressed by passing in the appropriate value to
the
-<code>compression</code> argument when calling
<code>DataFrame.write_parquet</code>.</p>
-<h2><code>uv</code> package management</h2>
-<p><a href="https://github.com/astral-sh/uv">uv</a> is an extremely fast
Python package manager, written in Rust. In the previous version
-of <code>datafusion-python</code> we had a combination of settings of PyPi and
Conda. Instead, we
-switch to using <a href="https://github.com/astral-sh/uv">uv</a> is our
primary method for dependency management.</p>
-<p>For most users of DataFusion, this change will be transparent. You can
still install
-via <code>pip</code> or <code>conda</code>. For developers, the instructions
in the repository have been updated.</p>
-<h2>Migration Guide</h2>
-<p>During the upgrade from <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/43.0.0.md">DataFusion
43.0.0</a> to <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> as our upstream core
-dependency, we discovered a few changes were necessary within our repository
and our
-unit tests. These notes serve to help guide users who may encounter similar
issues when
-upgrading.</p>
-<ul>
-<li><code>RuntimeConfig</code> is now deprecated in favor of
<code>RuntimeEnvBuilder</code>. The migration is
-fairly straightforward, and the corresponding classes have been marked as
deprecated. For
-end users it should be simply a matter of changing the class name.</li>
-<li>If you perform a <code>concat</code> of a <code>string_view</code> and
<code>string</code>, it will now return a
-<code>string_view</code> instead of a <code>string</code>. This likely only
impacts unit tests that are validating
-return types. In general, it is recommended to switch to using
<code>string_view</code> whenever
-possible. You can see the blog articles <a
href="https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/">String
View Pt 1</a> and <a
href="https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-2/">Pt
2</a> for more information
-on these performance improvements.</li>
-<li>The function <code>date_part</code> now returns an <code>int32</code>
instead of a <code>float64</code>. This is likely
-only impactful to unit tests.</li>
-</ul>
-<h2>Appreciation</h2>
-<p>We would like to thank everyone who has helped with these releases through
their helpful
-conversations, code review, issue descriptions, and code authoring. We would
especially
-like to thank the following authors of PRs who made these releases possible,
listed in
-alphabetical order by username: <a
href="https://github.com/chenkovsky">@chenkovsky</a>, <a
href="https://github.com/ion-elgreco">@ion-elgreco</a>, <a
href="https://github.com/kylebarron">@kylebarron</a>, and
-<a href="https://github.com/kosiew">@kosiew</a>.</p>
-<p>Thank you!</p>
-<h2>Get Involved</h2>
-<p>The DataFusion Python team is an active and engaging community and we would
love
-to have you join us and help the project.</p>
-<p>Here are some ways to get involved:</p>
-<ul>
-<li>
-<p>Learn more by visiting the <a
href="https://datafusion.apache.org/python/index.html">DataFusion Python
project</a> page.</p>
-</li>
-<li>
-<p>Try out the project and provide feedback, file issues, and contribute
code.</p>
-</li>
-<li>
-<p>Join us on <a href="https://s.apache.org/slack-invite">ASF Slack</a> or the
<a href="https://discord.gg/Qw5gKqHxUM">Arrow Rust Discord Server</a>.</p>
-</li>
-</ul>
- </div>
- </div>
- </div>
- <!-- footer -->
- <div class="row">
- <div class="large-12 medium-12 columns">
- <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
- Copyright 2025, <a href="https://www.apache.org/">The Apache
Software Foundation</a>, Licensed under the <a
href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version
2.0</a>.<br/>
- Apache® and the Apache feather logo are trademarks of The Apache
Software Foundation.
- </p>
- </div>
- </div>
- <script src="/blog/js/bootstrap.bundle.min.js"></script> </main>
- </body>
-</html>
diff --git a/blog/2025/02/17/datafusion-comet-0.6.0/index.html
b/blog/2025/02/17/datafusion-comet-0.6.0/index.html
new file mode 100644
index 0000000..a52fc4f
--- /dev/null
+++ b/blog/2025/02/17/datafusion-comet-0.6.0/index.html
@@ -0,0 +1,115 @@
+<!doctype html>
+<html class="no-js" lang="en" dir="ltr">
+ <head>
+ <meta charset="utf-8">
+ <meta http-equiv="x-ua-compatible" content="ie=edge">
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
+ <title>Apache DataFusion Comet 0.6.0 Release - Apache DataFusion
Blog</title>
+<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
+<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
+<link href="/blog/css/headerlink.css" rel="stylesheet">
+<link href="/blog/highlight/default.min.css" rel="stylesheet">
+<script src="/blog/highlight/highlight.js"></script>
+<script>hljs.highlightAll();</script> </head>
+ <body class="d-flex flex-column h-100">
+ <main class="flex-shrink-0">
+<!-- nav bar -->
+<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth
navbar example">
+ <div class="container-fluid">
+ <a class="navbar-brand" href="/blog"><img
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache
DataFusion Blog</a>
+ <button class="navbar-toggler" type="button" data-bs-toggle="collapse"
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false"
aria-label="Toggle navigation">
+ <span class="navbar-toggler-icon"></span>
+ </button>
+
+ <div class="collapse navbar-collapse" id="navbarADP">
+ <ul class="navbar-nav me-auto mb-2 mb-lg-0">
+ <li class="nav-item">
+ <a class="nav-link" href="/blog/about.html">About</a>
+ </li>
+ <li class="nav-item">
+ <a class="nav-link" href="/blog/feed.xml">RSS</a>
+ </li>
+ </ul>
+ </div>
+ </div>
+</nav>
+
+
+<!-- page contents -->
+<div id="contents">
+ <div class="bg-white p-5 rounded">
+ <div class="col-sm-8 mx-auto">
+ <h1>
+ Apache DataFusion Comet 0.6.0 Release
+ </h1>
+ <p>Posted on: Mon 17 February 2025 by pmc</p>
+ <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the <a
href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark physical
plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code changes.</p>
+<p>Comet runs on commodity hardware and aims to provide 100% compatibility
with Apache Spark. Any operators or
+expressions that are not fully compatible will fall back to Spark unless
explicitly enabled by the user. Refer
+to the <a
href="https://datafusion.apache.org/comet/user-guide/compatibility.html">compatibility
guide</a> for more information.</p>
+<p>This release covers approximately four weeks of development work and is the
result of merging 39 PRs from 12
+contributors. See the <a
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.6.0.md">change
log</a> for more information.</p>
+<p>Starting with this release, we now plan on releasing new versions of Comet
more frequently, typically within 1-2 weeks
+of each major DataFusion release.</p>
+<h2>Release Highlights</h2>
+<h3>DataFusion Upgrade</h3>
+<ul>
+<li>Comet 0.6.0 uses DataFusion 45.0.0</li>
+</ul>
+<h3>New Features</h3>
+<ul>
+<li>Comet now supports <code>array_join</code>, <code>array_intersect</code>,
and <code>arrays_overlap</code>.</li>
+</ul>
+<h3>Performance & Stability</h3>
+<ul>
+<li>Metrics from native execution are now updated in Spark every 3 seconds by
default, rather than for each
+ batch being processed. The mechanism for passing the metrics via JNI is also
more efficient.</li>
+<li>New memory pool options "fair unified" and "unbounded" have been added.
See the <a
href="https://datafusion.apache.org/comet/user-guide/tuning.html">Comet Tuning
Guide</a> for more information.</li>
+</ul>
+<h2>Bug Fixes</h2>
+<ul>
+<li>Hashing of decimal values with precision <= 18 is now compatible with
Spark</li>
+<li>Comet falls back to Spark when hashing decimals with precision > 18</li>
+</ul>
+<h2>Getting Involved</h2>
+<p>The Comet project welcomes new contributors. We use the same <a
href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack
and Discord</a> channels as the main DataFusion
+project and have a weekly <a
href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion
video call</a>.</p>
+<p>The easiest way to get involved is to test Comet with your current Spark
jobs and file issues for any bugs or
+performance regressions that you find. See the <a
href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting
Started</a> guide for instructions on downloading and installing
+Comet.</p>
+<p>There are also many <a
href="https://github.com/apache/datafusion-comet/contribute">good first
issues</a> waiting for contributions.</p>
+ </div>
+ </div>
+ </div>
+ <!-- footer -->
+ <div class="row">
+ <div class="large-12 medium-12 columns">
+ <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
+ Copyright 2025, <a href="https://www.apache.org/">The Apache
Software Foundation</a>, Licensed under the <a
href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version
2.0</a>.<br/>
+ Apache® and the Apache feather logo are trademarks of The Apache
Software Foundation.
+ </p>
+ </div>
+ </div>
+ <script src="/blog/js/bootstrap.bundle.min.js"></script> </main>
+ </body>
+</html>
diff --git a/blog/author/milenkovicm.html b/blog/author/milenkovicm.html
new file mode 100644
index 0000000..2ec6174
--- /dev/null
+++ b/blog/author/milenkovicm.html
@@ -0,0 +1,109 @@
+ <!doctype html>
+ <html class="no-js" lang="en" dir="ltr">
+ <head>
+ <meta charset="utf-8">
+ <meta http-equiv="x-ua-compatible" content="ie=edge">
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
+ <title>Apache DataFusion Blog</title>
+<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
+<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
+<link href="/blog/css/headerlink.css" rel="stylesheet">
+<link href="/blog/highlight/default.min.css" rel="stylesheet">
+<script src="/blog/highlight/highlight.js"></script>
+<script>hljs.highlightAll();</script> <link
href="/blog/css/blog_index.css" rel="stylesheet">
+ </head>
+ <body class="d-flex flex-column h-100">
+ <main class="flex-shrink-0">
+ <div>
+
+<!-- nav bar -->
+<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth
navbar example">
+ <div class="container-fluid">
+ <a class="navbar-brand" href="/blog"><img
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache
DataFusion Blog</a>
+ <button class="navbar-toggler" type="button" data-bs-toggle="collapse"
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false"
aria-label="Toggle navigation">
+ <span class="navbar-toggler-icon"></span>
+ </button>
+
+ <div class="collapse navbar-collapse" id="navbarADP">
+ <ul class="navbar-nav me-auto mb-2 mb-lg-0">
+ <li class="nav-item">
+ <a class="nav-link" href="/blog/about.html">About</a>
+ </li>
+ <li class="nav-item">
+ <a class="nav-link" href="/blog/feed.xml">RSS</a>
+ </li>
+ </ul>
+ </div>
+ </div>
+</nav>
+ <div id="contents">
+ <div class="bg-white p-5 rounded">
+ <div class="col-sm-8 mx-auto">
+<div id="contents">
+ <div class="bg-white p-5 rounded">
+ <div class="col-sm-8 mx-auto">
+
+ <h3>Welcome to the Apache DataFusion Blog!</h3>
+ <p><i>Here you can find the latest updates from DataFusion and
related projects.</i></p>
+
+
+ <!-- Post -->
+ <div class="row">
+ <div class="callout">
+ <article class="post">
+ <header>
+ <div class="title">
+ <h1><a
href="/blog/2025/02/02/datafusion-ballista-43.0.0">Apache DataFusion Ballista
43.0.0 Released</a></h1>
+ <p>Posted on: Sun 02 February 2025 by milenkovicm</p>
+ <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be scaled
out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of …</p></p>
+ <footer>
+ <ul class="actions">
+ <div style="text-align: right"><a
href="/blog/2025/02/02/datafusion-ballista-43.0.0" class="button
medium">Continue Reading</a></div>
+ </ul>
+ <ul class="stats">
+ </ul>
+ </footer>
+ </article>
+ </div>
+ </div>
+
+ </div>
+ </div>
+</div> </div>
+ </div>
+ </div>
+
+ <!-- footer -->
+ <div class="row">
+ <div class="large-12 medium-12 columns">
+ <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
+ Copyright 2025, <a href="https://www.apache.org/">The Apache
Software Foundation</a>, Licensed under the <a
href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version
2.0</a>.<br/>
+ Apache® and the Apache feather logo are trademarks of The Apache
Software Foundation.
+ </p>
+ </div>
+ </div>
+ <script src="/blog/js/bootstrap.bundle.min.js"></script> </div>
+ </main>
+ </body>
+ </html>
diff --git a/blog/author/pmc.html b/blog/author/pmc.html
index 35da552..1c09344 100644
--- a/blog/author/pmc.html
+++ b/blog/author/pmc.html
@@ -47,6 +47,46 @@
<p><i>Here you can find the latest updates from DataFusion and
related projects.</i></p>
+ <!-- Post -->
+ <div class="row">
+ <div class="callout">
+ <article class="post">
+ <header>
+ <div class="title">
+ <h1><a
href="/blog/2025/02/17/datafusion-comet-0.6.0">Apache DataFusion Comet 0.6.0
Release</a></h1>
+ <p>Posted on: Mon 17 February 2025 by pmc</p>
+ <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the <a
href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark physical
plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code changes.</p>
+<p>Comet runs on commodity hardware and aims to …</p></p>
+ <footer>
+ <ul class="actions">
+ <div style="text-align: right"><a
href="/blog/2025/02/17/datafusion-comet-0.6.0" class="button medium">Continue
Reading</a></div>
+ </ul>
+ <ul class="stats">
+ </ul>
+ </footer>
+ </article>
+ </div>
+ </div>
<!-- Post -->
<div class="row">
<div class="callout">
diff --git a/blog/author/timsaucer.html b/blog/author/timsaucer.html
index bffdf26..1a905f0 100644
--- a/blog/author/timsaucer.html
+++ b/blog/author/timsaucer.html
@@ -47,47 +47,6 @@
<p><i>Here you can find the latest updates from DataFusion and
related projects.</i></p>
- <!-- Post -->
- <div class="row">
- <div class="callout">
- <article class="post">
- <header>
- <div class="title">
- <h1><a
href="/blog/2025/02/07/datafusion-python-44.0.0">Apache DataFusion Python
44.0.0 Released</a></h1>
- <p>Posted on: Fri 07 February 2025 by timsaucer</p>
- <p><!--
-{% comment %}
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to you under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-{% endcomment %}
--->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python 44.0.0</a>
has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch …</code></p></p>
- <footer>
- <ul class="actions">
- <div style="text-align: right"><a
href="/blog/2025/02/07/datafusion-python-44.0.0" class="button medium">Continue
Reading</a></div>
- </ul>
- <ul class="stats">
- </ul>
- </footer>
- </article>
- </div>
- </div>
<!-- Post -->
<div class="row">
<div class="callout">
diff --git a/blog/category/blog.html b/blog/category/blog.html
index 4dc46dc..739cf1c 100644
--- a/blog/category/blog.html
+++ b/blog/category/blog.html
@@ -53,8 +53,8 @@
<article class="post">
<header>
<div class="title">
- <h1><a
href="/blog/2025/02/07/datafusion-python-44.0.0">Apache DataFusion Python
44.0.0 Released</a></h1>
- <p>Posted on: Fri 07 February 2025 by timsaucer</p>
+ <h1><a
href="/blog/2025/02/17/datafusion-comet-0.6.0">Apache DataFusion Comet 0.6.0
Release</a></h1>
+ <p>Posted on: Mon 17 February 2025 by pmc</p>
<p><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
@@ -73,14 +73,53 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python 44.0.0</a>
has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch …</code></p></p>
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the <a
href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark physical
plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code changes.</p>
+<p>Comet runs on commodity hardware and aims to …</p></p>
+ <footer>
+ <ul class="actions">
+ <div style="text-align: right"><a
href="/blog/2025/02/17/datafusion-comet-0.6.0" class="button medium">Continue
Reading</a></div>
+ </ul>
+ <ul class="stats">
+ </ul>
+ </footer>
+ </article>
+ </div>
+ </div>
+ <!-- Post -->
+ <div class="row">
+ <div class="callout">
+ <article class="post">
+ <header>
+ <div class="title">
+ <h1><a
href="/blog/2025/02/02/datafusion-ballista-43.0.0">Apache DataFusion Ballista
43.0.0 Released</a></h1>
+ <p>Posted on: Sun 02 February 2025 by milenkovicm</p>
+ <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be scaled
out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of …</p></p>
<footer>
<ul class="actions">
- <div style="text-align: right"><a
href="/blog/2025/02/07/datafusion-python-44.0.0" class="button medium">Continue
Reading</a></div>
+ <div style="text-align: right"><a
href="/blog/2025/02/02/datafusion-ballista-43.0.0" class="button
medium">Continue Reading</a></div>
</ul>
<ul class="stats">
</ul>
diff --git a/blog/feed.xml b/blog/feed.xml
index 6ed48dd..d9f2c7b 100644
--- a/blog/feed.xml
+++ b/blog/feed.xml
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Fri,
07 Feb 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion
Python 44.0.0
Released</title><link>https://datafusion.apache.org/blog/2025/02/07/datafusion-python-44.0.0</link><description><!--
+<rss version="2.0"><channel><title>Apache DataFusion
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
17 Feb 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet
0.6.0
Release</title><link>https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0</link><description><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -17,11 +17,31 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python
44.0.0</a> has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch
…</code></p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">timsaucer</dc:creator><pubDate>Fri,
07 Feb 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-02-07:/blog/2025/02/07/datafusion-python-44.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.5.0
Release</title><link>https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0</lin
[...]
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>Comet runs on commodity hardware and aims to
…</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Mon, 17
Feb 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-02-17:/blog/2025/02/17/datafusion-comet-0.6.0</guid><category>blog</category></item><item><title>Apache
DataFusion Ballista 43.0.0
Released</title><link>https://datafusion.apache.org/blog/2025/02/02/datafusion-ballista-43.0.0</link><d
[...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be
scaled out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of …</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">milenkovicm</dc:creator><pubDate>Sun,
02 Feb 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-02-02:/blog/2025/02/02/datafusion-ballista-43.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.5.0
Release</title><link>https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0</link><description><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml
index ea19200..df2032d 100644
--- a/blog/feeds/all-en.atom.xml
+++ b/blog/feeds/all-en.atom.xml
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion
Blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-07T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Python 44.0.0 Released</title><link
href="https://datafusion.apache.org/blog/2025/02/07/datafusion-python-44.0.0
[...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion
Blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-17T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Comet 0.6.0 Release</title><link
href="https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0" rel
[...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -17,11 +17,10 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python
44.0.0</a> has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch
…</code></p></summary><content type="html"><!--
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>Comet runs on commodity hardware and aims to
…</p></summary><content type="html"><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -39,66 +38,133 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python
44.0.0</a> has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch</code> from a
<code>RecordBatchStream</code> was a synchronous call, which would
-require the end user's code to wait for the data retrieval. This is described
in
-<a href="https://github.com/apache/datafusion-python/issues/974">Issue
974</a>. We continue to support this as a synchronous iterator, but we
have also added
-in the ability to retrieve the <code>RecordBatch</code> using the
Python asynchronous <code>anext</code>
-function.</p>
-<h1>Default Compression for Parquet files</h1>
-<p>With <a
href="https://github.com/apache/datafusion-python/pull/981">PR
981</a>, we change the saving of Parquet files to use zstd compression by
default.
-Previously the default was uncompressed, causing excessive disk storage. Zstd
is an
-excellent compression scheme that balances speed and compression ratio. Users
can still
-save their Parquet files uncompressed by passing in the appropriate value to
the
-<code>compression</code> argument when calling
<code>DataFrame.write_parquet</code>.</p>
-<h2><code>uv</code> package management</h2>
-<p><a href="https://github.com/astral-sh/uv">uv</a> is an
extremely fast Python package manager, written in Rust. In the previous version
-of <code>datafusion-python</code> we had a combination of settings
of PyPi and Conda. Instead, we
-switch to using <a href="https://github.com/astral-sh/uv">uv</a>
is our primary method for dependency management.</p>
-<p>For most users of DataFusion, this change will be transparent. You
can still install
-via <code>pip</code> or <code>conda</code>. For
developers, the instructions in the repository have been updated.</p>
-<h2>Migration Guide</h2>
-<p>During the upgrade from <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/43.0.0.md">DataFusion
43.0.0</a> to <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> as our upstream core
-dependency, we discovered a few changes were necessary within our repository
and our
-unit tests. These notes serve to help guide users who may encounter similar
issues when
-upgrading.</p>
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>Comet runs on commodity hardware and aims to provide 100%
compatibility with Apache Spark. Any operators or
+expressions that are not fully compatible will fall back to Spark unless
explicitly enabled by the user. Refer
+to the <a
href="https://datafusion.apache.org/comet/user-guide/compatibility.html">compatibility
guide</a> for more information.</p>
+<p>This release covers approximately four weeks of development work and
is the result of merging 39 PRs from 12
+contributors. See the <a
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.6.0.md">change
log</a> for more information.</p>
+<p>Starting with this release, we now plan on releasing new versions of
Comet more frequently, typically within 1-2 weeks
+of each major DataFusion release.</p>
+<h2>Release Highlights</h2>
+<h3>DataFusion Upgrade</h3>
<ul>
-<li><code>RuntimeConfig</code> is now deprecated in favor of
<code>RuntimeEnvBuilder</code>. The migration is
-fairly straightforward, and the corresponding classes have been marked as
deprecated. For
-end users it should be simply a matter of changing the class name.</li>
-<li>If you perform a <code>concat</code> of a
<code>string_view</code> and <code>string</code>, it
will now return a
-<code>string_view</code> instead of a
<code>string</code>. This likely only impacts unit tests that are
validating
-return types. In general, it is recommended to switch to using
<code>string_view</code> whenever
-possible. You can see the blog articles <a
href="https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/">String
View Pt 1</a> and <a
href="https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-2/">Pt
2</a> for more information
-on these performance improvements.</li>
-<li>The function <code>date_part</code> now returns an
<code>int32</code> instead of a <code>float64</code>.
This is likely
-only impactful to unit tests.</li>
+<li>Comet 0.6.0 uses DataFusion 45.0.0</li>
</ul>
-<h2>Appreciation</h2>
-<p>We would like to thank everyone who has helped with these releases
through their helpful
-conversations, code review, issue descriptions, and code authoring. We would
especially
-like to thank the following authors of PRs who made these releases possible,
listed in
-alphabetical order by username: <a
href="https://github.com/chenkovsky">@chenkovsky</a>, <a
href="https://github.com/ion-elgreco">@ion-elgreco</a>, <a
href="https://github.com/kylebarron">@kylebarron</a>, and
-<a href="https://github.com/kosiew">@kosiew</a>.</p>
-<p>Thank you!</p>
-<h2>Get Involved</h2>
-<p>The DataFusion Python team is an active and engaging community and we
would love
-to have you join us and help the project.</p>
-<p>Here are some ways to get involved:</p>
+<h3>New Features</h3>
<ul>
-<li>
-<p>Learn more by visiting the <a
href="https://datafusion.apache.org/python/index.html">DataFusion Python
project</a> page.</p>
-</li>
-<li>
-<p>Try out the project and provide feedback, file issues, and contribute
code.</p>
-</li>
-<li>
-<p>Join us on <a href="https://s.apache.org/slack-invite">ASF
Slack</a> or the <a href="https://discord.gg/Qw5gKqHxUM">Arrow Rust
Discord Server</a>.</p>
-</li>
-</ul></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.5.0
Release</title><link
href="https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0"
rel="alternate"></link><published>2025-01-17T00:00:00+00:00</published><updated>2025-01-17T00:00:00+00:00</updated><author><name>pmc</name></author><id>tag:datafusion.apache.org,2025-01-17:/blog/2025/01/17/datafusion-comet-0.5.0</id><summary
type="html"><!--
+<li>Comet now supports <code>array_join</code>,
<code>array_intersect</code>, and
<code>arrays_overlap</code>.</li>
+</ul>
+<h3>Performance &amp; Stability</h3>
+<ul>
+<li>Metrics from native execution are now updated in Spark every 3
seconds by default, rather than for each
+ batch being processed. The mechanism for passing the metrics via JNI is also
more efficient.</li>
+<li>New memory pool options "fair unified" and "unbounded" have been
added. See the <a
href="https://datafusion.apache.org/comet/user-guide/tuning.html">Comet
Tuning Guide</a> for more information.</li>
+</ul>
+<h2>Bug Fixes</h2>
+<ul>
+<li>Hashing of decimal values with precision &lt;= 18 is now
compatible with Spark</li>
+<li>Comet falls back to Spark when hashing decimals with precision
&gt; 18</li>
+</ul>
+<h2>Getting Involved</h2>
+<p>The Comet project welcomes new contributors. We use the same <a
href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack
and Discord</a> channels as the main DataFusion
+project and have a weekly <a
href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion
video call</a>.</p>
+<p>The easiest way to get involved is to test Comet with your current
Spark jobs and file issues for any bugs or
+performance regressions that you find. See the <a
href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting
Started</a> guide for instructions on downloading and installing
+Comet.</p>
+<p>There are also many <a
href="https://github.com/apache/datafusion-comet/contribute">good first
issues</a> waiting for contributions.</p></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Ballista 43.0.0
Released</title><link
href="https://datafusion.apache.org/blog/2025/02/02/datafusion-ballista-43.0.0"
rel="alternate"></link><published>2025-02-02T00:00:00+00:00</published><updated>2025-02-02T00:00:00+00:00</updated><author><name
[...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be
scaled out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of …</p></summary><content
type="html"><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be
scaled out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of this release has been to achieve a more
seamless integration with the DataFusion ecosystem and try to achieve the same
level of flexibility as DataFusion.</p>
+<p>In recent months, our development efforts have been directed toward
providing a robust and extensible Ballista API. This new API empowers end-users
to tailor Ballista's core functionality to their specific use cases. As a
result, we have deprecated several experimental features from the Ballista
core, allowing users to reintroduce them as custom extensions outside the core
framework. This shift reduces the maintenance burden on Ballista's core
maintainers and paves the way for o [...]
+<p>The most significant enhancement in this release is the deprecation
of <code>BallistaContext</code>, which has been superseded by the
DataFusion <code>SessionContext</code>. This change enables
DataFusion applications written in Rust to execute on a Ballista cluster with
minimal modifications. Beyond simplifying migration and reducing maintenance
overhead, this update introduces distributed write functionality to Ballista
for the first time, significantly [...]
+<div
class="codehilite"><pre><span></span><code><span
class="k">use</span><span class="w"> </span><span
class="n">ballista</span>::<span
class="n">prelude</span>::<span class="o">*</span><span
class="p">;</span><span class="w"></span>
+<span class="k">use</span><span class="w">
</span><span class="n">datafusion</span>::<span
class="n">prelude</span>::<span class="o">*</span><span
class="p">;</span><span class="w"></span>
+
+<span class="cp">#[tokio::main]</span><span
class="w"></span>
+<span class="k">async</span><span class="w">
</span><span class="k">fn</span> <span
class="nf">main</span><span class="p">()</span><span
class="w"> </span>-&gt; <span
class="nc">datafusion</span>::<span
class="n">error</span>::<span
class="nb">Result</span><span
class="o">&lt;</span><span class="p">()</span><span
class="o">&gt;</span> [...]
+
+<span class="w"> </span><span class="c1">// Instead of
creating classic SessionContext</span>
+<span class="w"> </span><span class="c1">// let ctx =
SessionContext::new();</span>
+
+<span class="w"> </span><span class="c1">// create
DataFusion SessionContext with ballista standalone cluster started</span>
+<span class="w"> </span><span class="c1">// let ctx =
SessionContext::standalone().await;</span>
+
+<span class="w"> </span><span class="c1">// create
DataFusion SessionContext with ballista remote cluster started</span>
+<span class="w"> </span><span
class="kd">let</span><span class="w"> </span><span
class="n">ctx</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">SessionContext</span>::<span
class="n">remote</span><span class="p">(</span><span
class="s">"df://localhost:50050"</span><span
class="p">).</span><span cla [...]
+
+<span class="w"> </span><span class="c1">// register the
table</span>
+<span class="w"> </span><span
class="n">ctx</span><span class="p">.</span><span
class="n">register_csv</span><span
class="p">(</span><span class="s">"example"</span><span
class="p">,</span><span class="w"> </span><span
class="s">"tests/data/example.csv"</span><span
class="p">,</span><span class="w"> </span><span
class="n">CsvReadOptions</span>:: [...]
+
+<span class="w"> </span><span class="c1">// create a plan
to run a SQL query</span>
+<span class="w"> </span><span
class="kd">let</span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">ctx</span><span class="p">.</span><span
class="n">sql</span><span class="p">(</span><span
class="s">"SELECT a, MIN(b) FROM example WHERE a &lt;= b GROUP BY a LIM
[...]
+
+<span class="w"> </span><span class="c1">// execute and
print results</span>
+<span class="w"> </span><span
class="n">df</span><span class="p">.</span><span
class="n">show</span><span class="p">().</span><span
class="k">await</span><span class="o">?</span><span
class="p">;</span><span class="w"></span>
+<span class="w"> </span><span
class="nb">Ok</span><span class="p">(())</span><span
class="w"></span>
+<span class="p">}</span><span class="w"></span>
+</code></pre></div>
+<p>Additionally, Ballista&rsquo;s versioning scheme has been aligned
with that of DataFusion, ensuring that Ballista's version number reflects the
compatible DataFusion version.</p>
+<p>At the moment there is a gap between DataFusion and Ballista, which
we will try to bridge in the future.</p>
+<h3>Removal of Experimental Features</h3>
+<p>Ballista had grown in scope to include several experimental features
in various states of completeness. Some features have been removed from this
release in an effort to strip Ballista back to its core and make it easier to
maintain and extend.</p>
+<p>Specifically, the caching subsystem, predefined object store
registry, plugin subsystem, key-value stores for persistent scheduler state,
and the UI have been removed.</p>
+<h3>Performance &amp; Scalability</h3>
+<p>Ballista has significantly leveraged the advancements made in the
DataFusion project over the past year. Benchmark results demonstrate notable
improvements in performance, highlighting the impact of these
enhancements:</p>
+<p>Per query comparison:</p>
+<p><img alt="Per query comparison" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_compare.png"
width="100%"/></p>
+<p>Relative speedup:</p>
+<p><img alt="Relative speedup graph" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_speedup_rel.png"
width="100%"/></p>
+<p>The overall speedup is 2.9x</p>
+<p><img alt="Overall speedup" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_allqueries.png"
width="50%"/></p>
+<h3>New Logo</h3>
+<p>Ballista now has a new logo, which is visually similar to other
DataFusion projects. </p>
+<p><img alt="New logo" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/ballista-logo.png"
width="50%"/></p>
+<h2>Roadmap</h2>
+<p>Moving forward, Ballista will adopt the same release cadence as
DataFusion, providing synchronized updates across the ecosystem.
+Currently, there is no established long-term roadmap for Ballista. A plan will
be formulated in the coming months based on community feedback and the
availability of additional maintainers.</p>
+<p>In the short term, development efforts will concentrate on closing
the feature gap between DataFusion and Ballista. Key priorities include
implementing support for <code>INSERT INTO</code>, enabling table
<code>URL</code> functionality, and achieving deeper integration
with the Python ecosystem.</p></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.5.0
Release</title><link href="https://datafusion.apache.org/b [...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml
index 2a31eec..98142c5 100644
--- a/blog/feeds/blog.atom.xml
+++ b/blog/feeds/blog.atom.xml
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-07T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Python 44.0.0 Released</title><link
href="https://datafusion.apache.org/blog/2025/02/07/datafusion-python-4 [...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-17T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Comet 0.6.0 Release</title><link
href="https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0 [...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -17,11 +17,10 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python
44.0.0</a> has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch
…</code></p></summary><content type="html"><!--
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>Comet runs on commodity hardware and aims to
…</p></summary><content type="html"><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -39,66 +38,133 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python
44.0.0</a> has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch</code> from a
<code>RecordBatchStream</code> was a synchronous call, which would
-require the end user's code to wait for the data retrieval. This is described
in
-<a href="https://github.com/apache/datafusion-python/issues/974">Issue
974</a>. We continue to support this as a synchronous iterator, but we
have also added
-in the ability to retrieve the <code>RecordBatch</code> using the
Python asynchronous <code>anext</code>
-function.</p>
-<h1>Default Compression for Parquet files</h1>
-<p>With <a
href="https://github.com/apache/datafusion-python/pull/981">PR
981</a>, we change the saving of Parquet files to use zstd compression by
default.
-Previously the default was uncompressed, causing excessive disk storage. Zstd
is an
-excellent compression scheme that balances speed and compression ratio. Users
can still
-save their Parquet files uncompressed by passing in the appropriate value to
the
-<code>compression</code> argument when calling
<code>DataFrame.write_parquet</code>.</p>
-<h2><code>uv</code> package management</h2>
-<p><a href="https://github.com/astral-sh/uv">uv</a> is an
extremely fast Python package manager, written in Rust. In the previous version
-of <code>datafusion-python</code> we had a combination of settings
of PyPi and Conda. Instead, we
-switch to using <a href="https://github.com/astral-sh/uv">uv</a>
is our primary method for dependency management.</p>
-<p>For most users of DataFusion, this change will be transparent. You
can still install
-via <code>pip</code> or <code>conda</code>. For
developers, the instructions in the repository have been updated.</p>
-<h2>Migration Guide</h2>
-<p>During the upgrade from <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/43.0.0.md">DataFusion
43.0.0</a> to <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> as our upstream core
-dependency, we discovered a few changes were necessary within our repository
and our
-unit tests. These notes serve to help guide users who may encounter similar
issues when
-upgrading.</p>
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>Comet runs on commodity hardware and aims to provide 100%
compatibility with Apache Spark. Any operators or
+expressions that are not fully compatible will fall back to Spark unless
explicitly enabled by the user. Refer
+to the <a
href="https://datafusion.apache.org/comet/user-guide/compatibility.html">compatibility
guide</a> for more information.</p>
+<p>This release covers approximately four weeks of development work and
is the result of merging 39 PRs from 12
+contributors. See the <a
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.6.0.md">change
log</a> for more information.</p>
+<p>Starting with this release, we now plan on releasing new versions of
Comet more frequently, typically within 1-2 weeks
+of each major DataFusion release.</p>
+<h2>Release Highlights</h2>
+<h3>DataFusion Upgrade</h3>
<ul>
-<li><code>RuntimeConfig</code> is now deprecated in favor of
<code>RuntimeEnvBuilder</code>. The migration is
-fairly straightforward, and the corresponding classes have been marked as
deprecated. For
-end users it should be simply a matter of changing the class name.</li>
-<li>If you perform a <code>concat</code> of a
<code>string_view</code> and <code>string</code>, it
will now return a
-<code>string_view</code> instead of a
<code>string</code>. This likely only impacts unit tests that are
validating
-return types. In general, it is recommended to switch to using
<code>string_view</code> whenever
-possible. You can see the blog articles <a
href="https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/">String
View Pt 1</a> and <a
href="https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-2/">Pt
2</a> for more information
-on these performance improvements.</li>
-<li>The function <code>date_part</code> now returns an
<code>int32</code> instead of a <code>float64</code>.
This is likely
-only impactful to unit tests.</li>
+<li>Comet 0.6.0 uses DataFusion 45.0.0</li>
</ul>
-<h2>Appreciation</h2>
-<p>We would like to thank everyone who has helped with these releases
through their helpful
-conversations, code review, issue descriptions, and code authoring. We would
especially
-like to thank the following authors of PRs who made these releases possible,
listed in
-alphabetical order by username: <a
href="https://github.com/chenkovsky">@chenkovsky</a>, <a
href="https://github.com/ion-elgreco">@ion-elgreco</a>, <a
href="https://github.com/kylebarron">@kylebarron</a>, and
-<a href="https://github.com/kosiew">@kosiew</a>.</p>
-<p>Thank you!</p>
-<h2>Get Involved</h2>
-<p>The DataFusion Python team is an active and engaging community and we
would love
-to have you join us and help the project.</p>
-<p>Here are some ways to get involved:</p>
+<h3>New Features</h3>
<ul>
-<li>
-<p>Learn more by visiting the <a
href="https://datafusion.apache.org/python/index.html">DataFusion Python
project</a> page.</p>
-</li>
-<li>
-<p>Try out the project and provide feedback, file issues, and contribute
code.</p>
-</li>
-<li>
-<p>Join us on <a href="https://s.apache.org/slack-invite">ASF
Slack</a> or the <a href="https://discord.gg/Qw5gKqHxUM">Arrow Rust
Discord Server</a>.</p>
-</li>
-</ul></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.5.0
Release</title><link
href="https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0"
rel="alternate"></link><published>2025-01-17T00:00:00+00:00</published><updated>2025-01-17T00:00:00+00:00</updated><author><name>pmc</name></author><id>tag:datafusion.apache.org,2025-01-17:/blog/2025/01/17/datafusion-comet-0.5.0</id><summary
type="html"><!--
+<li>Comet now supports <code>array_join</code>,
<code>array_intersect</code>, and
<code>arrays_overlap</code>.</li>
+</ul>
+<h3>Performance &amp; Stability</h3>
+<ul>
+<li>Metrics from native execution are now updated in Spark every 3
seconds by default, rather than for each
+ batch being processed. The mechanism for passing the metrics via JNI is also
more efficient.</li>
+<li>New memory pool options "fair unified" and "unbounded" have been
added. See the <a
href="https://datafusion.apache.org/comet/user-guide/tuning.html">Comet
Tuning Guide</a> for more information.</li>
+</ul>
+<h2>Bug Fixes</h2>
+<ul>
+<li>Hashing of decimal values with precision &lt;= 18 is now
compatible with Spark</li>
+<li>Comet falls back to Spark when hashing decimals with precision
&gt; 18</li>
+</ul>
+<h2>Getting Involved</h2>
+<p>The Comet project welcomes new contributors. We use the same <a
href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack
and Discord</a> channels as the main DataFusion
+project and have a weekly <a
href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion
video call</a>.</p>
+<p>The easiest way to get involved is to test Comet with your current
Spark jobs and file issues for any bugs or
+performance regressions that you find. See the <a
href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting
Started</a> guide for instructions on downloading and installing
+Comet.</p>
+<p>There are also many <a
href="https://github.com/apache/datafusion-comet/contribute">good first
issues</a> waiting for contributions.</p></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Ballista 43.0.0
Released</title><link
href="https://datafusion.apache.org/blog/2025/02/02/datafusion-ballista-43.0.0"
rel="alternate"></link><published>2025-02-02T00:00:00+00:00</published><updated>2025-02-02T00:00:00+00:00</updated><author><name
[...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be
scaled out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of …</p></summary><content
type="html"><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be
scaled out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of this release has been to achieve a more
seamless integration with the DataFusion ecosystem and try to achieve the same
level of flexibility as DataFusion.</p>
+<p>In recent months, our development efforts have been directed toward
providing a robust and extensible Ballista API. This new API empowers end-users
to tailor Ballista's core functionality to their specific use cases. As a
result, we have deprecated several experimental features from the Ballista
core, allowing users to reintroduce them as custom extensions outside the core
framework. This shift reduces the maintenance burden on Ballista's core
maintainers and paves the way for o [...]
+<p>The most significant enhancement in this release is the deprecation
of <code>BallistaContext</code>, which has been superseded by the
DataFusion <code>SessionContext</code>. This change enables
DataFusion applications written in Rust to execute on a Ballista cluster with
minimal modifications. Beyond simplifying migration and reducing maintenance
overhead, this update introduces distributed write functionality to Ballista
for the first time, significantly [...]
+<div
class="codehilite"><pre><span></span><code><span
class="k">use</span><span class="w"> </span><span
class="n">ballista</span>::<span
class="n">prelude</span>::<span class="o">*</span><span
class="p">;</span><span class="w"></span>
+<span class="k">use</span><span class="w">
</span><span class="n">datafusion</span>::<span
class="n">prelude</span>::<span class="o">*</span><span
class="p">;</span><span class="w"></span>
+
+<span class="cp">#[tokio::main]</span><span
class="w"></span>
+<span class="k">async</span><span class="w">
</span><span class="k">fn</span> <span
class="nf">main</span><span class="p">()</span><span
class="w"> </span>-&gt; <span
class="nc">datafusion</span>::<span
class="n">error</span>::<span
class="nb">Result</span><span
class="o">&lt;</span><span class="p">()</span><span
class="o">&gt;</span> [...]
+
+<span class="w"> </span><span class="c1">// Instead of
creating classic SessionContext</span>
+<span class="w"> </span><span class="c1">// let ctx =
SessionContext::new();</span>
+
+<span class="w"> </span><span class="c1">// create
DataFusion SessionContext with ballista standalone cluster started</span>
+<span class="w"> </span><span class="c1">// let ctx =
SessionContext::standalone().await;</span>
+
+<span class="w"> </span><span class="c1">// create
DataFusion SessionContext with ballista remote cluster started</span>
+<span class="w"> </span><span
class="kd">let</span><span class="w"> </span><span
class="n">ctx</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">SessionContext</span>::<span
class="n">remote</span><span class="p">(</span><span
class="s">"df://localhost:50050"</span><span
class="p">).</span><span cla [...]
+
+<span class="w"> </span><span class="c1">// register the
table</span>
+<span class="w"> </span><span
class="n">ctx</span><span class="p">.</span><span
class="n">register_csv</span><span
class="p">(</span><span class="s">"example"</span><span
class="p">,</span><span class="w"> </span><span
class="s">"tests/data/example.csv"</span><span
class="p">,</span><span class="w"> </span><span
class="n">CsvReadOptions</span>:: [...]
+
+<span class="w"> </span><span class="c1">// create a plan
to run a SQL query</span>
+<span class="w"> </span><span
class="kd">let</span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">ctx</span><span class="p">.</span><span
class="n">sql</span><span class="p">(</span><span
class="s">"SELECT a, MIN(b) FROM example WHERE a &lt;= b GROUP BY a LIM
[...]
+
+<span class="w"> </span><span class="c1">// execute and
print results</span>
+<span class="w"> </span><span
class="n">df</span><span class="p">.</span><span
class="n">show</span><span class="p">().</span><span
class="k">await</span><span class="o">?</span><span
class="p">;</span><span class="w"></span>
+<span class="w"> </span><span
class="nb">Ok</span><span class="p">(())</span><span
class="w"></span>
+<span class="p">}</span><span class="w"></span>
+</code></pre></div>
+<p>Additionally, Ballista&rsquo;s versioning scheme has been aligned
with that of DataFusion, ensuring that Ballista's version number reflects the
compatible DataFusion version.</p>
+<p>At the moment there is a gap between DataFusion and Ballista, which
we will try to bridge in the future.</p>
+<h3>Removal of Experimental Features</h3>
+<p>Ballista had grown in scope to include several experimental features
in various states of completeness. Some features have been removed from this
release in an effort to strip Ballista back to its core and make it easier to
maintain and extend.</p>
+<p>Specifically, the caching subsystem, predefined object store
registry, plugin subsystem, key-value stores for persistent scheduler state,
and the UI have been removed.</p>
+<h3>Performance &amp; Scalability</h3>
+<p>Ballista has significantly leveraged the advancements made in the
DataFusion project over the past year. Benchmark results demonstrate notable
improvements in performance, highlighting the impact of these
enhancements:</p>
+<p>Per query comparison:</p>
+<p><img alt="Per query comparison" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_compare.png"
width="100%"/></p>
+<p>Relative speedup:</p>
+<p><img alt="Relative speedup graph" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_speedup_rel.png"
width="100%"/></p>
+<p>The overall speedup is 2.9x</p>
+<p><img alt="Overall speedup" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_allqueries.png"
width="50%"/></p>
+<h3>New Logo</h3>
+<p>Ballista now has a new logo, which is visually similar to other
DataFusion projects. </p>
+<p><img alt="New logo" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/ballista-logo.png"
width="50%"/></p>
+<h2>Roadmap</h2>
+<p>Moving forward, Ballista will adopt the same release cadence as
DataFusion, providing synchronized updates across the ecosystem.
+Currently, there is no established long-term roadmap for Ballista. A plan will
be formulated in the coming months based on community feedback and the
availability of additional maintainers.</p>
+<p>In the short term, development efforts will concentrate on closing
the feature gap between DataFusion and Ballista. Key priorities include
implementing support for <code>INSERT INTO</code>, enabling table
<code>URL</code> functionality, and achieving deeper integration
with the Python ecosystem.</p></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.5.0
Release</title><link href="https://datafusion.apache.org/b [...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/blog/feeds/milenkovicm.atom.xml b/blog/feeds/milenkovicm.atom.xml
new file mode 100644
index 0000000..7ef8b39
--- /dev/null
+++ b/blog/feeds/milenkovicm.atom.xml
@@ -0,0 +1,92 @@
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
milenkovicm</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/milenkovicm.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-02T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Ballista 43.0.0 Released</title><link
href="https://datafusion.apache.org/blog/2025/02/02/dat [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be
scaled out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of …</p></summary><content
type="html"><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be
scaled out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of this release has been to achieve a more
seamless integration with the DataFusion ecosystem and try to achieve the same
level of flexibility as DataFusion.</p>
+<p>In recent months, our development efforts have been directed toward
providing a robust and extensible Ballista API. This new API empowers end-users
to tailor Ballista's core functionality to their specific use cases. As a
result, we have deprecated several experimental features from the Ballista
core, allowing users to reintroduce them as custom extensions outside the core
framework. This shift reduces the maintenance burden on Ballista's core
maintainers and paves the way for o [...]
+<p>The most significant enhancement in this release is the deprecation
of <code>BallistaContext</code>, which has been superseded by the
DataFusion <code>SessionContext</code>. This change enables
DataFusion applications written in Rust to execute on a Ballista cluster with
minimal modifications. Beyond simplifying migration and reducing maintenance
overhead, this update introduces distributed write functionality to Ballista
for the first time, significantly [...]
+<div
class="codehilite"><pre><span></span><code><span
class="k">use</span><span class="w"> </span><span
class="n">ballista</span>::<span
class="n">prelude</span>::<span class="o">*</span><span
class="p">;</span><span class="w"></span>
+<span class="k">use</span><span class="w">
</span><span class="n">datafusion</span>::<span
class="n">prelude</span>::<span class="o">*</span><span
class="p">;</span><span class="w"></span>
+
+<span class="cp">#[tokio::main]</span><span
class="w"></span>
+<span class="k">async</span><span class="w">
</span><span class="k">fn</span> <span
class="nf">main</span><span class="p">()</span><span
class="w"> </span>-&gt; <span
class="nc">datafusion</span>::<span
class="n">error</span>::<span
class="nb">Result</span><span
class="o">&lt;</span><span class="p">()</span><span
class="o">&gt;</span> [...]
+
+<span class="w"> </span><span class="c1">// Instead of
creating classic SessionContext</span>
+<span class="w"> </span><span class="c1">// let ctx =
SessionContext::new();</span>
+
+<span class="w"> </span><span class="c1">// create
DataFusion SessionContext with ballista standalone cluster started</span>
+<span class="w"> </span><span class="c1">// let ctx =
SessionContext::standalone().await;</span>
+
+<span class="w"> </span><span class="c1">// create
DataFusion SessionContext with ballista remote cluster started</span>
+<span class="w"> </span><span
class="kd">let</span><span class="w"> </span><span
class="n">ctx</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">SessionContext</span>::<span
class="n">remote</span><span class="p">(</span><span
class="s">"df://localhost:50050"</span><span
class="p">).</span><span cla [...]
+
+<span class="w"> </span><span class="c1">// register the
table</span>
+<span class="w"> </span><span
class="n">ctx</span><span class="p">.</span><span
class="n">register_csv</span><span
class="p">(</span><span class="s">"example"</span><span
class="p">,</span><span class="w"> </span><span
class="s">"tests/data/example.csv"</span><span
class="p">,</span><span class="w"> </span><span
class="n">CsvReadOptions</span>:: [...]
+
+<span class="w"> </span><span class="c1">// create a plan
to run a SQL query</span>
+<span class="w"> </span><span
class="kd">let</span><span class="w"> </span><span
class="n">df</span><span class="w"> </span><span
class="o">=</span><span class="w"> </span><span
class="n">ctx</span><span class="p">.</span><span
class="n">sql</span><span class="p">(</span><span
class="s">"SELECT a, MIN(b) FROM example WHERE a &lt;= b GROUP BY a LIM
[...]
+
+<span class="w"> </span><span class="c1">// execute and
print results</span>
+<span class="w"> </span><span
class="n">df</span><span class="p">.</span><span
class="n">show</span><span class="p">().</span><span
class="k">await</span><span class="o">?</span><span
class="p">;</span><span class="w"></span>
+<span class="w"> </span><span
class="nb">Ok</span><span class="p">(())</span><span
class="w"></span>
+<span class="p">}</span><span class="w"></span>
+</code></pre></div>
+<p>Additionally, Ballista&rsquo;s versioning scheme has been aligned
with that of DataFusion, ensuring that Ballista's version number reflects the
compatible DataFusion version.</p>
+<p>At the moment there is a gap between DataFusion and Ballista, which
we will try to bridge in the future.</p>
+<h3>Removal of Experimental Features</h3>
+<p>Ballista had grown in scope to include several experimental features
in various states of completeness. Some features have been removed from this
release in an effort to strip Ballista back to its core and make it easier to
maintain and extend.</p>
+<p>Specifically, the caching subsystem, predefined object store
registry, plugin subsystem, key-value stores for persistent scheduler state,
and the UI have been removed.</p>
+<h3>Performance &amp; Scalability</h3>
+<p>Ballista has significantly leveraged the advancements made in the
DataFusion project over the past year. Benchmark results demonstrate notable
improvements in performance, highlighting the impact of these
enhancements:</p>
+<p>Per query comparison:</p>
+<p><img alt="Per query comparison" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_compare.png"
width="100%"/></p>
+<p>Relative speedup:</p>
+<p><img alt="Relative speedup graph" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_speedup_rel.png"
width="100%"/></p>
+<p>The overall speedup is 2.9x</p>
+<p><img alt="Overall speedup" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/tpch_allqueries.png"
width="50%"/></p>
+<h3>New Logo</h3>
+<p>Ballista now has a new logo, which is visually similar to other
DataFusion projects. </p>
+<p><img alt="New logo" class="img-responsive"
src="/blog/images/datafusion-ballista-43.0.0/ballista-logo.png"
width="50%"/></p>
+<h2>Roadmap</h2>
+<p>Moving forward, Ballista will adopt the same release cadence as
DataFusion, providing synchronized updates across the ecosystem.
+Currently, there is no established long-term roadmap for Ballista. A plan will
be formulated in the coming months based on community feedback and the
availability of additional maintainers.</p>
+<p>In the short term, development efforts will concentrate on closing
the feature gap between DataFusion and Ballista. Key priorities include
implementing support for <code>INSERT INTO</code>, enabling table
<code>URL</code> functionality, and achieving deeper integration
with the Python ecosystem.</p></content><category
term="blog"></category></entry></feed>
\ No newline at end of file
diff --git a/blog/feeds/milenkovicm.rss.xml b/blog/feeds/milenkovicm.rss.xml
new file mode 100644
index 0000000..af53b6a
--- /dev/null
+++ b/blog/feeds/milenkovicm.rss.xml
@@ -0,0 +1,23 @@
+<?xml version="1.0" encoding="utf-8"?>
+<rss version="2.0"><channel><title>Apache DataFusion Blog -
milenkovicm</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Sun,
02 Feb 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion
Ballista 43.0.0
Released</title><link>https://datafusion.apache.org/blog/2025/02/02/datafusion-ballista-43.0.0</link><description><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be
scaled out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of …</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">milenkovicm</dc:creator><pubDate>Sun,
02 Feb 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-02-02:/blog/2025/02/02/datafusion-ballista-43.0.0</guid><category>blog</category></item></channel></rss>
\ No newline at end of file
diff --git a/blog/feeds/pmc.atom.xml b/blog/feeds/pmc.atom.xml
index 31b92a2..8372ddc 100644
--- a/blog/feeds/pmc.atom.xml
+++ b/blog/feeds/pmc.atom.xml
@@ -1,5 +1,80 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
pmc</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-01-17T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Comet 0.5.0 Release</title><link
href="https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0"
[...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
pmc</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-17T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Comet 0.6.0 Release</title><link
href="https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0"
[...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>Comet runs on commodity hardware and aims to
…</p></summary><content type="html"><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>Comet runs on commodity hardware and aims to provide 100%
compatibility with Apache Spark. Any operators or
+expressions that are not fully compatible will fall back to Spark unless
explicitly enabled by the user. Refer
+to the <a
href="https://datafusion.apache.org/comet/user-guide/compatibility.html">compatibility
guide</a> for more information.</p>
+<p>This release covers approximately four weeks of development work and
is the result of merging 39 PRs from 12
+contributors. See the <a
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.6.0.md">change
log</a> for more information.</p>
+<p>Starting with this release, we now plan on releasing new versions of
Comet more frequently, typically within 1-2 weeks
+of each major DataFusion release.</p>
+<h2>Release Highlights</h2>
+<h3>DataFusion Upgrade</h3>
+<ul>
+<li>Comet 0.6.0 uses DataFusion 45.0.0</li>
+</ul>
+<h3>New Features</h3>
+<ul>
+<li>Comet now supports <code>array_join</code>,
<code>array_intersect</code>, and
<code>arrays_overlap</code>.</li>
+</ul>
+<h3>Performance &amp; Stability</h3>
+<ul>
+<li>Metrics from native execution are now updated in Spark every 3
seconds by default, rather than for each
+ batch being processed. The mechanism for passing the metrics via JNI is also
more efficient.</li>
+<li>New memory pool options "fair unified" and "unbounded" have been
added. See the <a
href="https://datafusion.apache.org/comet/user-guide/tuning.html">Comet
Tuning Guide</a> for more information.</li>
+</ul>
+<h2>Bug Fixes</h2>
+<ul>
+<li>Hashing of decimal values with precision &lt;= 18 is now
compatible with Spark</li>
+<li>Comet falls back to Spark when hashing decimals with precision
&gt; 18</li>
+</ul>
+<h2>Getting Involved</h2>
+<p>The Comet project welcomes new contributors. We use the same <a
href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack
and Discord</a> channels as the main DataFusion
+project and have a weekly <a
href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion
video call</a>.</p>
+<p>The easiest way to get involved is to test Comet with your current
Spark jobs and file issues for any bugs or
+performance regressions that you find. See the <a
href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting
Started</a> guide for instructions on downloading and installing
+Comet.</p>
+<p>There are also many <a
href="https://github.com/apache/datafusion-comet/contribute">good first
issues</a> waiting for contributions.</p></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.5.0
Release</title><link
href="https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0"
rel="alternate"></link><published>2025-01-17T00:00:00+00:00</published><updated>2025-01-17T00:00:00+00:00</updated><author><name>pmc</nam
[...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/blog/feeds/pmc.rss.xml b/blog/feeds/pmc.rss.xml
index 974782d..a0ed98c 100644
--- a/blog/feeds/pmc.rss.xml
+++ b/blog/feeds/pmc.rss.xml
@@ -1,5 +1,26 @@
<?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion Blog -
pmc</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Fri,
17 Jan 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet
0.5.0
Release</title><link>https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0</link><description><!--
+<rss version="2.0"><channel><title>Apache DataFusion Blog -
pmc</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
17 Feb 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet
0.6.0
Release</title><link>https://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0</link><description><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>Comet runs on commodity hardware and aims to
…</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Mon, 17
Feb 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-02-17:/blog/2025/02/17/datafusion-comet-0.6.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.5.0
Release</title><link>https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0</link><descriptio
[...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/blog/feeds/timsaucer.atom.xml b/blog/feeds/timsaucer.atom.xml
index 703d2a5..e292500 100644
--- a/blog/feeds/timsaucer.atom.xml
+++ b/blog/feeds/timsaucer.atom.xml
@@ -1,104 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
timsaucer</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/timsaucer.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-02-07T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Python 44.0.0 Released</title><link
href="https://datafusion.apache.org/blog/2025/02/07/datafusio [...]
-{% comment %}
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to you under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-{% endcomment %}
--->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python
44.0.0</a> has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch
…</code></p></summary><content type="html"><!--
-{% comment %}
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to you under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-{% endcomment %}
--->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python
44.0.0</a> has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch</code> from a
<code>RecordBatchStream</code> was a synchronous call, which would
-require the end user's code to wait for the data retrieval. This is described
in
-<a href="https://github.com/apache/datafusion-python/issues/974">Issue
974</a>. We continue to support this as a synchronous iterator, but we
have also added
-in the ability to retrieve the <code>RecordBatch</code> using the
Python asynchronous <code>anext</code>
-function.</p>
-<h1>Default Compression for Parquet files</h1>
-<p>With <a
href="https://github.com/apache/datafusion-python/pull/981">PR
981</a>, we change the saving of Parquet files to use zstd compression by
default.
-Previously the default was uncompressed, causing excessive disk storage. Zstd
is an
-excellent compression scheme that balances speed and compression ratio. Users
can still
-save their Parquet files uncompressed by passing in the appropriate value to
the
-<code>compression</code> argument when calling
<code>DataFrame.write_parquet</code>.</p>
-<h2><code>uv</code> package management</h2>
-<p><a href="https://github.com/astral-sh/uv">uv</a> is an
extremely fast Python package manager, written in Rust. In the previous version
-of <code>datafusion-python</code> we had a combination of settings
of PyPi and Conda. Instead, we
-switch to using <a href="https://github.com/astral-sh/uv">uv</a>
is our primary method for dependency management.</p>
-<p>For most users of DataFusion, this change will be transparent. You
can still install
-via <code>pip</code> or <code>conda</code>. For
developers, the instructions in the repository have been updated.</p>
-<h2>Migration Guide</h2>
-<p>During the upgrade from <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/43.0.0.md">DataFusion
43.0.0</a> to <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> as our upstream core
-dependency, we discovered a few changes were necessary within our repository
and our
-unit tests. These notes serve to help guide users who may encounter similar
issues when
-upgrading.</p>
-<ul>
-<li><code>RuntimeConfig</code> is now deprecated in favor of
<code>RuntimeEnvBuilder</code>. The migration is
-fairly straightforward, and the corresponding classes have been marked as
deprecated. For
-end users it should be simply a matter of changing the class name.</li>
-<li>If you perform a <code>concat</code> of a
<code>string_view</code> and <code>string</code>, it
will now return a
-<code>string_view</code> instead of a
<code>string</code>. This likely only impacts unit tests that are
validating
-return types. In general, it is recommended to switch to using
<code>string_view</code> whenever
-possible. You can see the blog articles <a
href="https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/">String
View Pt 1</a> and <a
href="https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-2/">Pt
2</a> for more information
-on these performance improvements.</li>
-<li>The function <code>date_part</code> now returns an
<code>int32</code> instead of a <code>float64</code>.
This is likely
-only impactful to unit tests.</li>
-</ul>
-<h2>Appreciation</h2>
-<p>We would like to thank everyone who has helped with these releases
through their helpful
-conversations, code review, issue descriptions, and code authoring. We would
especially
-like to thank the following authors of PRs who made these releases possible,
listed in
-alphabetical order by username: <a
href="https://github.com/chenkovsky">@chenkovsky</a>, <a
href="https://github.com/ion-elgreco">@ion-elgreco</a>, <a
href="https://github.com/kylebarron">@kylebarron</a>, and
-<a href="https://github.com/kosiew">@kosiew</a>.</p>
-<p>Thank you!</p>
-<h2>Get Involved</h2>
-<p>The DataFusion Python team is an active and engaging community and we
would love
-to have you join us and help the project.</p>
-<p>Here are some ways to get involved:</p>
-<ul>
-<li>
-<p>Learn more by visiting the <a
href="https://datafusion.apache.org/python/index.html">DataFusion Python
project</a> page.</p>
-</li>
-<li>
-<p>Try out the project and provide feedback, file issues, and contribute
code.</p>
-</li>
-<li>
-<p>Join us on <a href="https://s.apache.org/slack-invite">ASF
Slack</a> or the <a href="https://discord.gg/Qw5gKqHxUM">Arrow Rust
Discord Server</a>.</p>
-</li>
-</ul></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Python 43.1.0
Released</title><link
href="https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0"
rel="alternate"></link><published>2024-12-14T00:00:00+00:00</published><updated>2024-12-14T00:00:00+00:00</updated><author><name>timsaucer</name></author><id>tag:datafusion.apache.org,2024-12-14:/blog/2024/12/14/datafusion-python-43.1.0</id><summary
type="html"><!--
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
timsaucer</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/timsaucer.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2024-12-14T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Python 43.1.0 Released</title><link
href="https://datafusion.apache.org/blog/2024/12/14/datafusio [...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/blog/feeds/timsaucer.rss.xml b/blog/feeds/timsaucer.rss.xml
index 74acdfa..fc20003 100644
--- a/blog/feeds/timsaucer.rss.xml
+++ b/blog/feeds/timsaucer.rss.xml
@@ -1,27 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion Blog -
timsaucer</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Fri,
07 Feb 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion
Python 44.0.0
Released</title><link>https://datafusion.apache.org/blog/2025/02/07/datafusion-python-44.0.0</link><description><!--
-{% comment %}
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to you under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-{% endcomment %}
--->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python
44.0.0</a> has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch
…</code></p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">timsaucer</dc:creator><pubDate>Fri,
07 Feb 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-02-07:/blog/2025/02/07/datafusion-python-44.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Python 43.1.0
Released</title><link>https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0
[...]
+<rss version="2.0"><channel><title>Apache DataFusion Blog -
timsaucer</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Sat,
14 Dec 2024 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion
Python 43.1.0
Released</title><link>https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0</link><description><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/blog/images/datafusion-ballista-43.0.0/ballista-logo.png
b/blog/images/datafusion-ballista-43.0.0/ballista-logo.png
new file mode 100644
index 0000000..1ede07b
Binary files /dev/null and
b/blog/images/datafusion-ballista-43.0.0/ballista-logo.png differ
diff --git a/blog/images/datafusion-ballista-43.0.0/tpch_allqueries.png
b/blog/images/datafusion-ballista-43.0.0/tpch_allqueries.png
new file mode 100644
index 0000000..5e30bde
Binary files /dev/null and
b/blog/images/datafusion-ballista-43.0.0/tpch_allqueries.png differ
diff --git a/blog/images/datafusion-ballista-43.0.0/tpch_queries_compare.png
b/blog/images/datafusion-ballista-43.0.0/tpch_queries_compare.png
new file mode 100644
index 0000000..969043f
Binary files /dev/null and
b/blog/images/datafusion-ballista-43.0.0/tpch_queries_compare.png differ
diff --git
a/blog/images/datafusion-ballista-43.0.0/tpch_queries_speedup_rel.png
b/blog/images/datafusion-ballista-43.0.0/tpch_queries_speedup_rel.png
new file mode 100644
index 0000000..04f044c
Binary files /dev/null and
b/blog/images/datafusion-ballista-43.0.0/tpch_queries_speedup_rel.png differ
diff --git a/blog/index.html b/blog/index.html
index a8205c2..befc312 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -50,8 +50,8 @@
<article class="post">
<header>
<div class="title">
- <h1><a
href="/blog/2025/02/07/datafusion-python-44.0.0">Apache DataFusion Python
44.0.0 Released</a></h1>
- <p>Posted on: Fri 07 February 2025 by timsaucer</p>
+ <h1><a
href="/blog/2025/02/17/datafusion-comet-0.6.0">Apache DataFusion Comet 0.6.0
Release</a></h1>
+ <p>Posted on: Mon 17 February 2025 by pmc</p>
<p><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
@@ -70,14 +70,53 @@ See the License for the specific language governing
permissions and
limitations under the License.
{% endcomment %}
-->
-<p>We are happy to announce that <a
href="https://pypi.org/project/datafusion/44.0.0/">datafusion-python 44.0.0</a>
has been released. This release
-brings in all of the new features of the core <a
href="https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md">DataFusion
44.0.0</a> library. You can see the
-full details of the improvements in the <a
href="https://github.com/apache/datafusion-python/tree/main/dev/changelog">changelogs</a>.</p>
-<h2>Asynchronous Iteration of Record Batches</h2>
-<p>Retrieving a <code>RecordBatch …</code></p></p>
+<p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the <a
href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark physical
plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code changes.</p>
+<p>Comet runs on commodity hardware and aims to …</p></p>
+ <footer>
+ <ul class="actions">
+ <div style="text-align: right"><a
href="/blog/2025/02/17/datafusion-comet-0.6.0" class="button medium">Continue
Reading</a></div>
+ </ul>
+ <ul class="stats">
+ </ul>
+ </footer>
+ </article>
+ </div>
+ </div>
+ <!-- Post -->
+ <div class="row">
+ <div class="callout">
+ <article class="post">
+ <header>
+ <div class="title">
+ <h1><a
href="/blog/2025/02/02/datafusion-ballista-43.0.0">Apache DataFusion Ballista
43.0.0 Released</a></h1>
+ <p>Posted on: Sun 02 February 2025 by milenkovicm</p>
+ <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We are pleased to announce version <a
href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a>
of the <a href="https://datafusion.apache.org/ballista/">DataFusion
Ballista</a>. Ballista allows existing <a
href="https://datafusion.apache.org">DataFusion</a> applications to be scaled
out on a cluster for use cases that are not practical to run on a single
node.</p>
+<h2>Highlights of this release</h2>
+<h3>Seamless Integration with DataFusion</h3>
+<p>The primary objective of …</p></p>
<footer>
<ul class="actions">
- <div style="text-align: right"><a
href="/blog/2025/02/07/datafusion-python-44.0.0" class="button medium">Continue
Reading</a></div>
+ <div style="text-align: right"><a
href="/blog/2025/02/02/datafusion-ballista-43.0.0" class="button
medium">Continue Reading</a></div>
</ul>
<ul class="stats">
</ul>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]