This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-staging by this push:
new 5d3acf2 Commit build products
5d3acf2 is described below
commit 5d3acf2563ac8f0e71dfac3d35afbb3b023c812e
Author: Build Pelican (action) <[email protected]>
AuthorDate: Thu Nov 20 15:25:26 2025 +0000
Commit build products
---
blog/2025/11/25/datafusion-51.0.0/index.html | 34 ++++++++++---------
blog/author/pmc.html | 2 +-
blog/category/blog.html | 2 +-
blog/feed.xml | 2 +-
blog/feeds/all-en.atom.xml | 36 ++++++++++++---------
blog/feeds/blog.atom.xml | 36 ++++++++++++---------
blog/feeds/pmc.atom.xml | 36 ++++++++++++---------
blog/feeds/pmc.rss.xml | 2 +-
.../performance_over_time_clickbench.png | Bin 0 -> 61910 bytes
blog/index.html | 2 +-
10 files changed, 84 insertions(+), 68 deletions(-)
diff --git a/blog/2025/11/25/datafusion-51.0.0/index.html
b/blog/2025/11/25/datafusion-51.0.0/index.html
index f6077cf..ac5a65d 100644
--- a/blog/2025/11/25/datafusion-51.0.0/index.html
+++ b/blog/2025/11/25/datafusion-51.0.0/index.html
@@ -95,7 +95,10 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blob/b
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update this image</p>
+<p><strong>Figure 1</strong>: Average and median normalized query execution
times for ClickBench queries for DataFusion 51.0.0 compared to previous
releases.
+Query times are normalized using the ClickBench definition. See the
+<a href="https://alamb.github.io/datafusion-benchmarking/">DataFusion
Benchmarking Page</a>
+for more details.</p>
<h3 id="faster-case-expression-evaluation">Faster <code>CASE</code> expression
evaluation<a class="headerlink" href="#faster-case-expression-evaluation"
title="Permanent link">¶</a></h3>
<p>This release builds on the <a
href="https://github.com/apache/datafusion/issues/18075">CASE performance
epic</a> with significant improvements.
Expressions short‑circuit earlier, reuse partial results, and avoid unnecessary
@@ -103,19 +106,21 @@ scattering, speeding up common ETL patterns. Thanks to <a
href="https://github.c
and <a href="https://github.com/petern48">petern48</a> for leading this
effort. We hope to share more details on our
implementation in a future post.</p>
<p><strong>Fewer object store round-trips for Parquet by Default</strong></p>
-<p>DataFusion now sets a default <code>metadata_size_hint</code> for Parquet
scans
+<p>DataFusion now sets a default <code>metadata_size_hint</code> for <a
href="https://parquet.apache.org/">Apache Parquet</a> scans
(<a href="https://github.com/apache/datafusion/issues/18118">#18118</a>),
avoiding the extra
“last 8‑byte” request many clouds require to read file footers. Remote scans
typically drop from five requests to four per file, cutting latency and
transfer
costs without any application changes. Thanks to <a
href="https://github.com/zhuqi-lucas">zhuqi-lucas</a> for leading this
effort.</p>
<h3 id="faster-parquet-metadata-parsing">Faster Parquet metadata parsing<a
class="headerlink" href="#faster-parquet-metadata-parsing" title="Permanent
link">¶</a></h3>
-<p>DataFusion 51 also includes the latest Parquet reader improvements from
-<a href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, delivering faster Parquet metadata parsing. This is
+<p>DataFusion 51 also includes the latest Parquet reader from
+<a href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which is significantly faster parsing Parquet metadata. This is
especially beneficial for workloads with many small Parquet files and scenarios
-where startup time or low latency is important. Thanks to upstream work by
-<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> for leading this effort.</p>
+where startup time or low latency is important. You can read more about the
upstream work by
+<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements
+in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a> blog.</p>
<p><img alt="Metadata Parsing Performance Improvements in Arrow/Parquet 57"
class="img-responsive"
src="/blog/images/datafusion-51.0.0/arrow-57-metadata-parsing.png"
width="100%"/></p>
+<p><strong>Figure 2</strong>: Metadata parsing performance improvements in
Arrow/Parquet 57.0.0. </p>
<h3 id="better-defaults-for-remote-parquet-reads">Better Defaults for Remote
Parquet Reads<a class="headerlink"
href="#better-defaults-for-remote-parquet-reads" title="Permanent
link">¶</a></h3>
<p>DataFusion by default now fetches the last 512KB (configurable) of Parquet
files
so the first request usually includes the full footer (<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>). This will
@@ -180,7 +185,7 @@ strategies. Thanks to <a
href="https://github.com/BlakeOrth">BlakeOrth</a> for l
<p><code>DESCRIBE</code> now works on arbitrary queries, returning the schema
instead
of being an alias for <code>EXPLAIN</code> (<a
href="https://github.com/apache/datafusion/issues/18234">#18234</a>). This
brings DataFusion in line with engines
like DuckDB and makes it easy to inspect the output schema of queries
-without executing them.</p>
+without executing them. Thanks to <a
href="https://github.com/djanderson">djanderson</a> for leading this effort.</p>
<p>For example:</p>
<pre><code class="language-sql">DataFusion CLI v51.0.0
> create table t(a int, b varchar, c float) as values (1, 'a', 2.0);
@@ -203,14 +208,15 @@ Elapsed 0.002 seconds.
for scalar, aggregate, and window functions (<a
href="https://github.com/apache/datafusion/issues/17379">#17379</a>). You can
mix positional and named
arguments in any order, and error messages now list parameter names to make
diagnostics clearer. UDF authors can also expose parameter names so their
-functions benefit from the same syntax.</p>
+functions benefit from the same syntax. Thanks to <a
href="https://github.com/timsaucer">timsaucer</a> and <a
href="https://github.com/bubulalabu">bubulalabu</a> for leading this effort.</p>
<p>For example, you can pass arguments to functions like this:</p>
<pre><code class="language-sql">SELECT power(exponent => 3.0, base =>
2.0);
</code></pre>
<h3 id="metrics-improvement">Metrics improvement<a class="headerlink"
href="#metrics-improvement" title="Permanent link">¶</a></h3>
<p>The output of <a
href="https://datafusion.apache.org/user-guide/sql/explain.html#explain-analyze">EXPLAIN
ANALYZE</a> has been improved to include more metrics
-about execution time and memory usage of each operator in the query plan.
-Read about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>.</p>
+about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
+You can find more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
+<a href="https://github.com/2010YOUY01">2010YOUY01</a> for leading this
effort.</p>
<p>The <code>51.0.0</code> release adds:</p>
<ul>
<li><strong>Configuration</strong>: adds a new option
<code>datafusion.explain.analyze_level</code>, which can be set to
<code>summary</code> for a concise output or <code>dev</code> for the full set
of metrics (the previous default).</li>
@@ -229,11 +235,9 @@ explain analyze
select count(*)
from
'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'
where "URL" <> '';
-
-Now shows easier-to-understand metrics such as:
-
-```text
- metrics=[
+</code></pre>
+<p>Now shows easier-to-understand metrics such as:</p>
+<pre><code class="language-text"> metrics=[
output_rows=1000000,
elapsed_compute=16ns,
output_bytes=222.5 MB,
diff --git a/blog/author/pmc.html b/blog/author/pmc.html
index 92083a4..9a8aa92 100644
--- a/blog/author/pmc.html
+++ b/blog/author/pmc.html
@@ -54,7 +54,7 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blob/b
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update …</p> </div><!-- /.entry-content -->
+<p><strong>Figure 1 …</strong></p> </div><!-- /.entry-content -->
</article></li>
<li><article class="hentry">
<header> <h2 class="entry-title"><a
href="https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0"
rel="bookmark" title="Permalink to Apache DataFusion Comet 0.11.0
Release">Apache DataFusion Comet 0.11.0 Release</a></h2> </header>
diff --git a/blog/category/blog.html b/blog/category/blog.html
index 637163a..065b364 100644
--- a/blog/category/blog.html
+++ b/blog/category/blog.html
@@ -55,7 +55,7 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blob/b
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update …</p> </div><!-- /.entry-content -->
+<p><strong>Figure 1 …</strong></p> </div><!-- /.entry-content -->
</article></li>
<li><article class="hentry">
<header> <h2 class="entry-title"><a
href="https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0"
rel="bookmark" title="Permalink to Apache DataFusion Comet 0.11.0
Release">Apache DataFusion Comet 0.11.0 Release</a></h2> </header>
diff --git a/blog/feed.xml b/blog/feed.xml
index da62385..9d01a04 100644
--- a/blog/feed.xml
+++ b/blog/feed.xml
@@ -25,7 +25,7 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blo
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update …</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 25
Nov 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-11-25:/blog/2025/11/25/datafusion-51.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.11.0
Release</title><link>https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0</link><description><!--
+<p><strong>Figure 1
…</strong></p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 25
Nov 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-11-25:/blog/2025/11/25/datafusion-51.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.11.0
Release</title><link>https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0</link><description><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml
index 95cd13b..c4afbbf 100644
--- a/blog/feeds/all-en.atom.xml
+++ b/blog/feeds/all-en.atom.xml
@@ -25,7 +25,7 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blo
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update …</p></summary><content type="html"><!--
+<p><strong>Figure 1 …</strong></p></summary><content
type="html"><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -51,7 +51,10 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blo
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update this image</p>
+<p><strong>Figure 1</strong>: Average and median normalized
query execution times for ClickBench queries for DataFusion 51.0.0 compared to
previous releases.
+Query times are normalized using the ClickBench definition. See the
+<a href="https://alamb.github.io/datafusion-benchmarking/">DataFusion
Benchmarking Page</a>
+for more details.</p>
<h3 id="faster-case-expression-evaluation">Faster
<code>CASE</code> expression evaluation<a class="headerlink"
href="#faster-case-expression-evaluation" title="Permanent
link">¶</a></h3>
<p>This release builds on the <a
href="https://github.com/apache/datafusion/issues/18075">CASE performance
epic</a> with significant improvements.
Expressions short‑circuit earlier, reuse partial results, and avoid unnecessary
@@ -59,19 +62,21 @@ scattering, speeding up common ETL patterns. Thanks to
<a href="https://githu
and <a href="https://github.com/petern48">petern48</a> for leading
this effort. We hope to share more details on our
implementation in a future post.</p>
<p><strong>Fewer object store round-trips for Parquet by
Default</strong></p>
-<p>DataFusion now sets a default
<code>metadata_size_hint</code> for Parquet scans
+<p>DataFusion now sets a default
<code>metadata_size_hint</code> for <a
href="https://parquet.apache.org/">Apache Parquet</a> scans
(<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>),
avoiding the extra
“last 8‑byte” request many clouds require to read file footers. Remote scans
typically drop from five requests to four per file, cutting latency and
transfer
costs without any application changes. Thanks to <a
href="https://github.com/zhuqi-lucas">zhuqi-lucas</a> for leading this
effort.</p>
<h3 id="faster-parquet-metadata-parsing">Faster Parquet metadata
parsing<a class="headerlink" href="#faster-parquet-metadata-parsing"
title="Permanent link">¶</a></h3>
-<p>DataFusion 51 also includes the latest Parquet reader improvements
from
-<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, delivering faster Parquet metadata parsing. This is
+<p>DataFusion 51 also includes the latest Parquet reader from
+<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which is significantly faster parsing Parquet metadata. This
is
especially beneficial for workloads with many small Parquet files and scenarios
-where startup time or low latency is important. Thanks to upstream work by
-<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> for leading this
effort.</p>
+where startup time or low latency is important. You can read more about the
upstream work by
+<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements
+in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a>
blog.</p>
<p><img alt="Metadata Parsing Performance Improvements in
Arrow/Parquet 57" class="img-responsive"
src="/blog/images/datafusion-51.0.0/arrow-57-metadata-parsing.png"
width="100%"/></p>
+<p><strong>Figure 2</strong>: Metadata parsing performance
improvements in Arrow/Parquet 57.0.0. </p>
<h3 id="better-defaults-for-remote-parquet-reads">Better Defaults for
Remote Parquet Reads<a class="headerlink"
href="#better-defaults-for-remote-parquet-reads" title="Permanent
link">¶</a></h3>
<p>DataFusion by default now fetches the last 512KB (configurable) of
Parquet files
so the first request usually includes the full footer (<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>).
This will
@@ -136,7 +141,7 @@ strategies. Thanks to <a
href="https://github.com/BlakeOrth">BlakeOrth<
<p><code>DESCRIBE</code> now works on arbitrary queries,
returning the schema instead
of being an alias for <code>EXPLAIN</code> (<a
href="https://github.com/apache/datafusion/issues/18234">#18234</a>).
This brings DataFusion in line with engines
like DuckDB and makes it easy to inspect the output schema of queries
-without executing them.</p>
+without executing them. Thanks to <a
href="https://github.com/djanderson">djanderson</a> for leading this
effort.</p>
<p>For example:</p>
<pre><code class="language-sql">DataFusion CLI v51.0.0
&gt; create table t(a int, b varchar, c float) as values (1, 'a', 2.0);
@@ -159,14 +164,15 @@ Elapsed 0.002 seconds.
for scalar, aggregate, and window functions (<a
href="https://github.com/apache/datafusion/issues/17379">#17379</a>).
You can mix positional and named
arguments in any order, and error messages now list parameter names to make
diagnostics clearer. UDF authors can also expose parameter names so their
-functions benefit from the same syntax.</p>
+functions benefit from the same syntax. Thanks to <a
href="https://github.com/timsaucer">timsaucer</a> and <a
href="https://github.com/bubulalabu">bubulalabu</a> for leading this
effort.</p>
<p>For example, you can pass arguments to functions like this:</p>
<pre><code class="language-sql">SELECT power(exponent =&gt;
3.0, base =&gt; 2.0);
</code></pre>
<h3 id="metrics-improvement">Metrics improvement<a class="headerlink"
href="#metrics-improvement" title="Permanent link">¶</a></h3>
<p>The output of <a
href="https://datafusion.apache.org/user-guide/sql/explain.html#explain-analyze">EXPLAIN
ANALYZE</a> has been improved to include more metrics
-about execution time and memory usage of each operator in the query plan.
-Read about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>.</p>
+about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
+You can find more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
+<a href="https://github.com/2010YOUY01">2010YOUY01</a> for leading
this effort.</p>
<p>The <code>51.0.0</code> release adds:</p>
<ul>
<li><strong>Configuration</strong>: adds a new option
<code>datafusion.explain.analyze_level</code>, which can be set to
<code>summary</code> for a concise output or
<code>dev</code> for the full set of metrics (the previous
default).</li>
@@ -185,11 +191,9 @@ explain analyze
select count(*)
from
'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'
where "URL" &lt;&gt; '';
-
-Now shows easier-to-understand metrics such as:
-
-```text
- metrics=[
+</code></pre>
+<p>Now shows easier-to-understand metrics such as:</p>
+<pre><code class="language-text"> metrics=[
output_rows=1000000,
elapsed_compute=16ns,
output_bytes=222.5 MB,
diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml
index 367ce27..ce68dfc 100644
--- a/blog/feeds/blog.atom.xml
+++ b/blog/feeds/blog.atom.xml
@@ -25,7 +25,7 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blo
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update …</p></summary><content type="html"><!--
+<p><strong>Figure 1 …</strong></p></summary><content
type="html"><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -51,7 +51,10 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blo
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update this image</p>
+<p><strong>Figure 1</strong>: Average and median normalized
query execution times for ClickBench queries for DataFusion 51.0.0 compared to
previous releases.
+Query times are normalized using the ClickBench definition. See the
+<a href="https://alamb.github.io/datafusion-benchmarking/">DataFusion
Benchmarking Page</a>
+for more details.</p>
<h3 id="faster-case-expression-evaluation">Faster
<code>CASE</code> expression evaluation<a class="headerlink"
href="#faster-case-expression-evaluation" title="Permanent
link">¶</a></h3>
<p>This release builds on the <a
href="https://github.com/apache/datafusion/issues/18075">CASE performance
epic</a> with significant improvements.
Expressions short‑circuit earlier, reuse partial results, and avoid unnecessary
@@ -59,19 +62,21 @@ scattering, speeding up common ETL patterns. Thanks to
<a href="https://githu
and <a href="https://github.com/petern48">petern48</a> for leading
this effort. We hope to share more details on our
implementation in a future post.</p>
<p><strong>Fewer object store round-trips for Parquet by
Default</strong></p>
-<p>DataFusion now sets a default
<code>metadata_size_hint</code> for Parquet scans
+<p>DataFusion now sets a default
<code>metadata_size_hint</code> for <a
href="https://parquet.apache.org/">Apache Parquet</a> scans
(<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>),
avoiding the extra
“last 8‑byte” request many clouds require to read file footers. Remote scans
typically drop from five requests to four per file, cutting latency and
transfer
costs without any application changes. Thanks to <a
href="https://github.com/zhuqi-lucas">zhuqi-lucas</a> for leading this
effort.</p>
<h3 id="faster-parquet-metadata-parsing">Faster Parquet metadata
parsing<a class="headerlink" href="#faster-parquet-metadata-parsing"
title="Permanent link">¶</a></h3>
-<p>DataFusion 51 also includes the latest Parquet reader improvements
from
-<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, delivering faster Parquet metadata parsing. This is
+<p>DataFusion 51 also includes the latest Parquet reader from
+<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which is significantly faster parsing Parquet metadata. This
is
especially beneficial for workloads with many small Parquet files and scenarios
-where startup time or low latency is important. Thanks to upstream work by
-<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> for leading this
effort.</p>
+where startup time or low latency is important. You can read more about the
upstream work by
+<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements
+in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a>
blog.</p>
<p><img alt="Metadata Parsing Performance Improvements in
Arrow/Parquet 57" class="img-responsive"
src="/blog/images/datafusion-51.0.0/arrow-57-metadata-parsing.png"
width="100%"/></p>
+<p><strong>Figure 2</strong>: Metadata parsing performance
improvements in Arrow/Parquet 57.0.0. </p>
<h3 id="better-defaults-for-remote-parquet-reads">Better Defaults for
Remote Parquet Reads<a class="headerlink"
href="#better-defaults-for-remote-parquet-reads" title="Permanent
link">¶</a></h3>
<p>DataFusion by default now fetches the last 512KB (configurable) of
Parquet files
so the first request usually includes the full footer (<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>).
This will
@@ -136,7 +141,7 @@ strategies. Thanks to <a
href="https://github.com/BlakeOrth">BlakeOrth<
<p><code>DESCRIBE</code> now works on arbitrary queries,
returning the schema instead
of being an alias for <code>EXPLAIN</code> (<a
href="https://github.com/apache/datafusion/issues/18234">#18234</a>).
This brings DataFusion in line with engines
like DuckDB and makes it easy to inspect the output schema of queries
-without executing them.</p>
+without executing them. Thanks to <a
href="https://github.com/djanderson">djanderson</a> for leading this
effort.</p>
<p>For example:</p>
<pre><code class="language-sql">DataFusion CLI v51.0.0
&gt; create table t(a int, b varchar, c float) as values (1, 'a', 2.0);
@@ -159,14 +164,15 @@ Elapsed 0.002 seconds.
for scalar, aggregate, and window functions (<a
href="https://github.com/apache/datafusion/issues/17379">#17379</a>).
You can mix positional and named
arguments in any order, and error messages now list parameter names to make
diagnostics clearer. UDF authors can also expose parameter names so their
-functions benefit from the same syntax.</p>
+functions benefit from the same syntax. Thanks to <a
href="https://github.com/timsaucer">timsaucer</a> and <a
href="https://github.com/bubulalabu">bubulalabu</a> for leading this
effort.</p>
<p>For example, you can pass arguments to functions like this:</p>
<pre><code class="language-sql">SELECT power(exponent =&gt;
3.0, base =&gt; 2.0);
</code></pre>
<h3 id="metrics-improvement">Metrics improvement<a class="headerlink"
href="#metrics-improvement" title="Permanent link">¶</a></h3>
<p>The output of <a
href="https://datafusion.apache.org/user-guide/sql/explain.html#explain-analyze">EXPLAIN
ANALYZE</a> has been improved to include more metrics
-about execution time and memory usage of each operator in the query plan.
-Read about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>.</p>
+about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
+You can find more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
+<a href="https://github.com/2010YOUY01">2010YOUY01</a> for leading
this effort.</p>
<p>The <code>51.0.0</code> release adds:</p>
<ul>
<li><strong>Configuration</strong>: adds a new option
<code>datafusion.explain.analyze_level</code>, which can be set to
<code>summary</code> for a concise output or
<code>dev</code> for the full set of metrics (the previous
default).</li>
@@ -185,11 +191,9 @@ explain analyze
select count(*)
from
'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'
where "URL" &lt;&gt; '';
-
-Now shows easier-to-understand metrics such as:
-
-```text
- metrics=[
+</code></pre>
+<p>Now shows easier-to-understand metrics such as:</p>
+<pre><code class="language-text"> metrics=[
output_rows=1000000,
elapsed_compute=16ns,
output_bytes=222.5 MB,
diff --git a/blog/feeds/pmc.atom.xml b/blog/feeds/pmc.atom.xml
index cf4006b..06c1af5 100644
--- a/blog/feeds/pmc.atom.xml
+++ b/blog/feeds/pmc.atom.xml
@@ -25,7 +25,7 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blo
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update …</p></summary><content type="html"><!--
+<p><strong>Figure 1 …</strong></p></summary><content
type="html"><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -51,7 +51,10 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blo
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update this image</p>
+<p><strong>Figure 1</strong>: Average and median normalized
query execution times for ClickBench queries for DataFusion 51.0.0 compared to
previous releases.
+Query times are normalized using the ClickBench definition. See the
+<a href="https://alamb.github.io/datafusion-benchmarking/">DataFusion
Benchmarking Page</a>
+for more details.</p>
<h3 id="faster-case-expression-evaluation">Faster
<code>CASE</code> expression evaluation<a class="headerlink"
href="#faster-case-expression-evaluation" title="Permanent
link">¶</a></h3>
<p>This release builds on the <a
href="https://github.com/apache/datafusion/issues/18075">CASE performance
epic</a> with significant improvements.
Expressions short‑circuit earlier, reuse partial results, and avoid unnecessary
@@ -59,19 +62,21 @@ scattering, speeding up common ETL patterns. Thanks to
<a href="https://githu
and <a href="https://github.com/petern48">petern48</a> for leading
this effort. We hope to share more details on our
implementation in a future post.</p>
<p><strong>Fewer object store round-trips for Parquet by
Default</strong></p>
-<p>DataFusion now sets a default
<code>metadata_size_hint</code> for Parquet scans
+<p>DataFusion now sets a default
<code>metadata_size_hint</code> for <a
href="https://parquet.apache.org/">Apache Parquet</a> scans
(<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>),
avoiding the extra
“last 8‑byte” request many clouds require to read file footers. Remote scans
typically drop from five requests to four per file, cutting latency and
transfer
costs without any application changes. Thanks to <a
href="https://github.com/zhuqi-lucas">zhuqi-lucas</a> for leading this
effort.</p>
<h3 id="faster-parquet-metadata-parsing">Faster Parquet metadata
parsing<a class="headerlink" href="#faster-parquet-metadata-parsing"
title="Permanent link">¶</a></h3>
-<p>DataFusion 51 also includes the latest Parquet reader improvements
from
-<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, delivering faster Parquet metadata parsing. This is
+<p>DataFusion 51 also includes the latest Parquet reader from
+<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which is significantly faster parsing Parquet metadata. This
is
especially beneficial for workloads with many small Parquet files and scenarios
-where startup time or low latency is important. Thanks to upstream work by
-<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> for leading this
effort.</p>
+where startup time or low latency is important. You can read more about the
upstream work by
+<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements
+in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a>
blog.</p>
<p><img alt="Metadata Parsing Performance Improvements in
Arrow/Parquet 57" class="img-responsive"
src="/blog/images/datafusion-51.0.0/arrow-57-metadata-parsing.png"
width="100%"/></p>
+<p><strong>Figure 2</strong>: Metadata parsing performance
improvements in Arrow/Parquet 57.0.0. </p>
<h3 id="better-defaults-for-remote-parquet-reads">Better Defaults for
Remote Parquet Reads<a class="headerlink"
href="#better-defaults-for-remote-parquet-reads" title="Permanent
link">¶</a></h3>
<p>DataFusion by default now fetches the last 512KB (configurable) of
Parquet files
so the first request usually includes the full footer (<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>).
This will
@@ -136,7 +141,7 @@ strategies. Thanks to <a
href="https://github.com/BlakeOrth">BlakeOrth<
<p><code>DESCRIBE</code> now works on arbitrary queries,
returning the schema instead
of being an alias for <code>EXPLAIN</code> (<a
href="https://github.com/apache/datafusion/issues/18234">#18234</a>).
This brings DataFusion in line with engines
like DuckDB and makes it easy to inspect the output schema of queries
-without executing them.</p>
+without executing them. Thanks to <a
href="https://github.com/djanderson">djanderson</a> for leading this
effort.</p>
<p>For example:</p>
<pre><code class="language-sql">DataFusion CLI v51.0.0
&gt; create table t(a int, b varchar, c float) as values (1, 'a', 2.0);
@@ -159,14 +164,15 @@ Elapsed 0.002 seconds.
for scalar, aggregate, and window functions (<a
href="https://github.com/apache/datafusion/issues/17379">#17379</a>).
You can mix positional and named
arguments in any order, and error messages now list parameter names to make
diagnostics clearer. UDF authors can also expose parameter names so their
-functions benefit from the same syntax.</p>
+functions benefit from the same syntax. Thanks to <a
href="https://github.com/timsaucer">timsaucer</a> and <a
href="https://github.com/bubulalabu">bubulalabu</a> for leading this
effort.</p>
<p>For example, you can pass arguments to functions like this:</p>
<pre><code class="language-sql">SELECT power(exponent =&gt;
3.0, base =&gt; 2.0);
</code></pre>
<h3 id="metrics-improvement">Metrics improvement<a class="headerlink"
href="#metrics-improvement" title="Permanent link">¶</a></h3>
<p>The output of <a
href="https://datafusion.apache.org/user-guide/sql/explain.html#explain-analyze">EXPLAIN
ANALYZE</a> has been improved to include more metrics
-about execution time and memory usage of each operator in the query plan.
-Read about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>.</p>
+about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
+You can find more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
+<a href="https://github.com/2010YOUY01">2010YOUY01</a> for leading
this effort.</p>
<p>The <code>51.0.0</code> release adds:</p>
<ul>
<li><strong>Configuration</strong>: adds a new option
<code>datafusion.explain.analyze_level</code>, which can be set to
<code>summary</code> for a concise output or
<code>dev</code> for the full set of metrics (the previous
default).</li>
@@ -185,11 +191,9 @@ explain analyze
select count(*)
from
'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'
where "URL" &lt;&gt; '';
-
-Now shows easier-to-understand metrics such as:
-
-```text
- metrics=[
+</code></pre>
+<p>Now shows easier-to-understand metrics such as:</p>
+<pre><code class="language-text"> metrics=[
output_rows=1000000,
elapsed_compute=16ns,
output_bytes=222.5 MB,
diff --git a/blog/feeds/pmc.rss.xml b/blog/feeds/pmc.rss.xml
index 571d15b..f953959 100644
--- a/blog/feeds/pmc.rss.xml
+++ b/blog/feeds/pmc.rss.xml
@@ -25,7 +25,7 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blo
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update …</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 25
Nov 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-11-25:/blog/2025/11/25/datafusion-51.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.11.0
Release</title><link>https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0</link><description><!--
+<p><strong>Figure 1
…</strong></p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 25
Nov 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-11-25:/blog/2025/11/25/datafusion-51.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.11.0
Release</title><link>https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0</link><description><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png
b/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png
new file mode 100644
index 0000000..a120152
Binary files /dev/null and
b/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png differ
diff --git a/blog/index.html b/blog/index.html
index e8f31a9..19e9154 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -79,7 +79,7 @@ changes is available in the <a
href="https://github.com/apache/datafusion/blob/b
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p>TODO: update …</p></p>
+<p><strong>Figure 1 …</strong></p></p>
<footer>
<ul class="actions">
<div style="text-align: right"><a
href="/blog/2025/11/25/datafusion-51.0.0" class="button medium">Continue
Reading</a></div>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]