(datafusion-site) branch asf-staging updated: Commit build products

github-bot Sat, 24 Jan 2026 04:17:36 -0800

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git



The following commit(s) were added to refs/heads/asf-staging by this push:
     new f7dfd35  Commit build products
f7dfd35 is described below

commit f7dfd3561272dc80408b59a8d9f3b2d7b37de460
Author: Build Pelican (action) <[email protected]>
AuthorDate: Sat Jan 24 12:17:16 2026 +0000

    Commit build products
---
 blog/2026/01/08/datafusion-52.0.0/index.html | 19 ++++++++++---------
 blog/feeds/all-en.atom.xml                   | 19 ++++++++++---------
 blog/feeds/blog.atom.xml                     | 19 ++++++++++---------
 blog/feeds/pmc.atom.xml                      | 19 ++++++++++---------
 4 files changed, 40 insertions(+), 36 deletions(-)

diff --git a/blog/2026/01/08/datafusion-52.0.0/index.html 
b/blog/2026/01/08/datafusion-52.0.0/index.html
index 0752c1e..bca7f3c 100644
--- a/blog/2026/01/08/datafusion-52.0.0/index.html
+++ b/blog/2026/01/08/datafusion-52.0.0/index.html
@@ -117,8 +117,8 @@ leaving other queries unchanged or modestly faster. Thanks 
to <a href="https://g
 the implementation and reviews from <a 
href="https://github.com/Dandandan";>Dandandan</a>.</p>
 <h3 id="caching-improvements">Caching Improvements<a class="headerlink" 
href="#caching-improvements" title="Permanent link">¶</a></h3>
 <p>This release also includes several additional caching improvements.</p>
-<p>A new statistics cache for Parquet Metadata avoids repeatedly 
(re)calculating
-statistics for Parquet backed files. This significantly improves planning time
+<p>A new statistics cache for File Metadata avoids repeatedly (re)calculating
+statistics for files. This significantly improves planning time
 for certain queries. You can see the contents of the new cache using the
 <a 
href="https://datafusion.apache.org/user-guide/cli/functions.html#statistics-cache";>statistics_cache</a>
 function in the CLI:</p>
 <pre><code class="language-sql">select * from statistics_cache();
@@ -165,13 +165,14 @@ Related PRs: <a 
href="https://github.com/apache/datafusion/pull/18146";>#18146</a
 dynamically to scans, as explained in the <a 
href="https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/#hash-join-dynamic-filters";>Dynamic
 Filtering Blog</a> using a
 technique referred to as <a 
href="https://dl.acm.org/doi/10.1109/ICDE.2008.4497486";>Sideways Information 
Passing</a> in Database research
 literature. The initial implementation passed min/max values for the join keys.
-DataFusion 52 extends the optimization (<a 
href="https://github.com/apache/datafusion/issues/17171";>#17171</a> / <a 
href="https://github.com/apache/datafusion/pull/18393";>#18393</a>) to use an 
<code>IN</code> list when the
-build size is small such as when the join is very selective or a reference to 
the build side hash map when the build side is larger.
-These new expressions are pushed down to the probe side scan and is used to 
prune files, row groups, and
-individual rows.
-When the build side is small enough (&lt;=20 rows but configurable) the pushed 
down filters can even participate in statistics pruning to avoid even reading 
the join keys from row groups that will not match.</p>
-<p>Thanks to <a href="https://github.com/adriangb";>adriangb</a> for 
implementing this feature, with
-reviews from <a href="https://github.com/LiaCastaneda";>LiaCastaneda</a>, <a 
href="https://github.com/asolimando";>asolimando</a>, <a 
href="https://github.com/comphead";>comphead</a>, and <a 
href="https://github.com/mbutrovich";>mbutrovich</a>.</p>
+DataFusion 52 extends the optimization (<a 
href="https://github.com/apache/datafusion/issues/17171";>#17171</a> / <a 
href="https://github.com/apache/datafusion/pull/18393";>#18393</a>) to pass the
+contents of the build side hash map. These filters are evaluated on the probe
+side scan to prune files, row groups, and individual rows. When the build side
+contains <code>20</code> or fewer rows (configurable) the contents of the hash 
map are
+transformed to an <code>IN</code> expression and used for <a 
href="https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html";>statistics-based
 pruning</a> which
+can avoid reading entire files or row groups that contain no matching join 
keys.
+Thanks to <a href="https://github.com/adriangb";>adriangb</a> for implementing 
this feature, with reviews from
+<a href="https://github.com/LiaCastaneda";>LiaCastaneda</a>, <a 
href="https://github.com/asolimando";>asolimando</a>, <a 
href="https://github.com/comphead";>comphead</a>, and <a 
href="https://github.com/mbutrovich";>mbutrovich</a>.</p>
 <h2 id="major-features">Major Features ✨<a class="headerlink" 
href="#major-features" title="Permanent link">¶</a></h2>
 <h3 id="arrow-ipc-stream-file-support">Arrow IPC Stream file support<a 
class="headerlink" href="#arrow-ipc-stream-file-support" title="Permanent 
link">¶</a></h3>
 <p>DataFusion can now read Arrow IPC stream files (<a 
href="https://github.com/apache/datafusion/pull/18457";>#18457</a>). This expands
diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml
index 1aa4cf9..ab1efa7 100644
--- a/blog/feeds/all-en.atom.xml
+++ b/blog/feeds/all-en.atom.xml
@@ -355,8 +355,8 @@ leaving other queries unchanged or modestly faster. Thanks 
to &lt;a href="https:
 the implementation and reviews from &lt;a 
href="https://github.com/Dandandan"&gt;Dandandan&lt;/a&gt;.&lt;/p&gt;
 &lt;h3 id="caching-improvements"&gt;Caching Improvements&lt;a 
class="headerlink" href="#caching-improvements" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
 &lt;p&gt;This release also includes several additional caching 
improvements.&lt;/p&gt;
-&lt;p&gt;A new statistics cache for Parquet Metadata avoids repeatedly 
(re)calculating
-statistics for Parquet backed files. This significantly improves planning time
+&lt;p&gt;A new statistics cache for File Metadata avoids repeatedly 
(re)calculating
+statistics for files. This significantly improves planning time
 for certain queries. You can see the contents of the new cache using the
 &lt;a 
href="https://datafusion.apache.org/user-guide/cli/functions.html#statistics-cache"&gt;statistics_cache&lt;/a&gt;
 function in the CLI:&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-sql"&gt;select * from statistics_cache();
@@ -403,13 +403,14 @@ Related PRs: &lt;a 
href="https://github.com/apache/datafusion/pull/18146"&gt;#18
 dynamically to scans, as explained in the &lt;a 
href="https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/#hash-join-dynamic-filters"&gt;Dynamic
 Filtering Blog&lt;/a&gt; using a
 technique referred to as &lt;a 
href="https://dl.acm.org/doi/10.1109/ICDE.2008.4497486"&gt;Sideways Information 
Passing&lt;/a&gt; in Database research
 literature. The initial implementation passed min/max values for the join keys.
-DataFusion 52 extends the optimization (&lt;a 
href="https://github.com/apache/datafusion/issues/17171"&gt;#17171&lt;/a&gt; / 
&lt;a 
href="https://github.com/apache/datafusion/pull/18393"&gt;#18393&lt;/a&gt;) to 
use an &lt;code&gt;IN&lt;/code&gt; list when the
-build size is small such as when the join is very selective or a reference to 
the build side hash map when the build side is larger.
-These new expressions are pushed down to the probe side scan and is used to 
prune files, row groups, and
-individual rows.
-When the build side is small enough (&amp;lt;=20 rows but configurable) the 
pushed down filters can even participate in statistics pruning to avoid even 
reading the join keys from row groups that will not match.&lt;/p&gt;
-&lt;p&gt;Thanks to &lt;a 
href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt; for implementing this 
feature, with
-reviews from &lt;a 
href="https://github.com/LiaCastaneda"&gt;LiaCastaneda&lt;/a&gt;, &lt;a 
href="https://github.com/asolimando"&gt;asolimando&lt;/a&gt;, &lt;a 
href="https://github.com/comphead"&gt;comphead&lt;/a&gt;, and &lt;a 
href="https://github.com/mbutrovich"&gt;mbutrovich&lt;/a&gt;.&lt;/p&gt;
+DataFusion 52 extends the optimization (&lt;a 
href="https://github.com/apache/datafusion/issues/17171"&gt;#17171&lt;/a&gt; / 
&lt;a 
href="https://github.com/apache/datafusion/pull/18393"&gt;#18393&lt;/a&gt;) to 
pass the
+contents of the build side hash map. These filters are evaluated on the probe
+side scan to prune files, row groups, and individual rows. When the build side
+contains &lt;code&gt;20&lt;/code&gt; or fewer rows (configurable) the contents 
of the hash map are
+transformed to an &lt;code&gt;IN&lt;/code&gt; expression and used for &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html"&gt;statistics-based
 pruning&lt;/a&gt; which
+can avoid reading entire files or row groups that contain no matching join 
keys.
+Thanks to &lt;a href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt; for 
implementing this feature, with reviews from
+&lt;a href="https://github.com/LiaCastaneda"&gt;LiaCastaneda&lt;/a&gt;, &lt;a 
href="https://github.com/asolimando"&gt;asolimando&lt;/a&gt;, &lt;a 
href="https://github.com/comphead"&gt;comphead&lt;/a&gt;, and &lt;a 
href="https://github.com/mbutrovich"&gt;mbutrovich&lt;/a&gt;.&lt;/p&gt;
 &lt;h2 id="major-features"&gt;Major Features ✨&lt;a class="headerlink" 
href="#major-features" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
 &lt;h3 id="arrow-ipc-stream-file-support"&gt;Arrow IPC Stream file 
support&lt;a class="headerlink" href="#arrow-ipc-stream-file-support" 
title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
 &lt;p&gt;DataFusion can now read Arrow IPC stream files (&lt;a 
href="https://github.com/apache/datafusion/pull/18457"&gt;#18457&lt;/a&gt;). 
This expands
diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml
index 896d617..0254d55 100644
--- a/blog/feeds/blog.atom.xml
+++ b/blog/feeds/blog.atom.xml
@@ -355,8 +355,8 @@ leaving other queries unchanged or modestly faster. Thanks 
to &lt;a href="https:
 the implementation and reviews from &lt;a 
href="https://github.com/Dandandan"&gt;Dandandan&lt;/a&gt;.&lt;/p&gt;
 &lt;h3 id="caching-improvements"&gt;Caching Improvements&lt;a 
class="headerlink" href="#caching-improvements" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
 &lt;p&gt;This release also includes several additional caching 
improvements.&lt;/p&gt;
-&lt;p&gt;A new statistics cache for Parquet Metadata avoids repeatedly 
(re)calculating
-statistics for Parquet backed files. This significantly improves planning time
+&lt;p&gt;A new statistics cache for File Metadata avoids repeatedly 
(re)calculating
+statistics for files. This significantly improves planning time
 for certain queries. You can see the contents of the new cache using the
 &lt;a 
href="https://datafusion.apache.org/user-guide/cli/functions.html#statistics-cache"&gt;statistics_cache&lt;/a&gt;
 function in the CLI:&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-sql"&gt;select * from statistics_cache();
@@ -403,13 +403,14 @@ Related PRs: &lt;a 
href="https://github.com/apache/datafusion/pull/18146"&gt;#18
 dynamically to scans, as explained in the &lt;a 
href="https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/#hash-join-dynamic-filters"&gt;Dynamic
 Filtering Blog&lt;/a&gt; using a
 technique referred to as &lt;a 
href="https://dl.acm.org/doi/10.1109/ICDE.2008.4497486"&gt;Sideways Information 
Passing&lt;/a&gt; in Database research
 literature. The initial implementation passed min/max values for the join keys.
-DataFusion 52 extends the optimization (&lt;a 
href="https://github.com/apache/datafusion/issues/17171"&gt;#17171&lt;/a&gt; / 
&lt;a 
href="https://github.com/apache/datafusion/pull/18393"&gt;#18393&lt;/a&gt;) to 
use an &lt;code&gt;IN&lt;/code&gt; list when the
-build size is small such as when the join is very selective or a reference to 
the build side hash map when the build side is larger.
-These new expressions are pushed down to the probe side scan and is used to 
prune files, row groups, and
-individual rows.
-When the build side is small enough (&amp;lt;=20 rows but configurable) the 
pushed down filters can even participate in statistics pruning to avoid even 
reading the join keys from row groups that will not match.&lt;/p&gt;
-&lt;p&gt;Thanks to &lt;a 
href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt; for implementing this 
feature, with
-reviews from &lt;a 
href="https://github.com/LiaCastaneda"&gt;LiaCastaneda&lt;/a&gt;, &lt;a 
href="https://github.com/asolimando"&gt;asolimando&lt;/a&gt;, &lt;a 
href="https://github.com/comphead"&gt;comphead&lt;/a&gt;, and &lt;a 
href="https://github.com/mbutrovich"&gt;mbutrovich&lt;/a&gt;.&lt;/p&gt;
+DataFusion 52 extends the optimization (&lt;a 
href="https://github.com/apache/datafusion/issues/17171"&gt;#17171&lt;/a&gt; / 
&lt;a 
href="https://github.com/apache/datafusion/pull/18393"&gt;#18393&lt;/a&gt;) to 
pass the
+contents of the build side hash map. These filters are evaluated on the probe
+side scan to prune files, row groups, and individual rows. When the build side
+contains &lt;code&gt;20&lt;/code&gt; or fewer rows (configurable) the contents 
of the hash map are
+transformed to an &lt;code&gt;IN&lt;/code&gt; expression and used for &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html"&gt;statistics-based
 pruning&lt;/a&gt; which
+can avoid reading entire files or row groups that contain no matching join 
keys.
+Thanks to &lt;a href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt; for 
implementing this feature, with reviews from
+&lt;a href="https://github.com/LiaCastaneda"&gt;LiaCastaneda&lt;/a&gt;, &lt;a 
href="https://github.com/asolimando"&gt;asolimando&lt;/a&gt;, &lt;a 
href="https://github.com/comphead"&gt;comphead&lt;/a&gt;, and &lt;a 
href="https://github.com/mbutrovich"&gt;mbutrovich&lt;/a&gt;.&lt;/p&gt;
 &lt;h2 id="major-features"&gt;Major Features ✨&lt;a class="headerlink" 
href="#major-features" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
 &lt;h3 id="arrow-ipc-stream-file-support"&gt;Arrow IPC Stream file 
support&lt;a class="headerlink" href="#arrow-ipc-stream-file-support" 
title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
 &lt;p&gt;DataFusion can now read Arrow IPC stream files (&lt;a 
href="https://github.com/apache/datafusion/pull/18457"&gt;#18457&lt;/a&gt;). 
This expands
diff --git a/blog/feeds/pmc.atom.xml b/blog/feeds/pmc.atom.xml
index c0c97d7..9f44274 100644
--- a/blog/feeds/pmc.atom.xml
+++ b/blog/feeds/pmc.atom.xml
@@ -71,8 +71,8 @@ leaving other queries unchanged or modestly faster. Thanks to 
&lt;a href="https:
 the implementation and reviews from &lt;a 
href="https://github.com/Dandandan"&gt;Dandandan&lt;/a&gt;.&lt;/p&gt;
 &lt;h3 id="caching-improvements"&gt;Caching Improvements&lt;a 
class="headerlink" href="#caching-improvements" title="Permanent 
link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
 &lt;p&gt;This release also includes several additional caching 
improvements.&lt;/p&gt;
-&lt;p&gt;A new statistics cache for Parquet Metadata avoids repeatedly 
(re)calculating
-statistics for Parquet backed files. This significantly improves planning time
+&lt;p&gt;A new statistics cache for File Metadata avoids repeatedly 
(re)calculating
+statistics for files. This significantly improves planning time
 for certain queries. You can see the contents of the new cache using the
 &lt;a 
href="https://datafusion.apache.org/user-guide/cli/functions.html#statistics-cache"&gt;statistics_cache&lt;/a&gt;
 function in the CLI:&lt;/p&gt;
 &lt;pre&gt;&lt;code class="language-sql"&gt;select * from statistics_cache();
@@ -119,13 +119,14 @@ Related PRs: &lt;a 
href="https://github.com/apache/datafusion/pull/18146"&gt;#18
 dynamically to scans, as explained in the &lt;a 
href="https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/#hash-join-dynamic-filters"&gt;Dynamic
 Filtering Blog&lt;/a&gt; using a
 technique referred to as &lt;a 
href="https://dl.acm.org/doi/10.1109/ICDE.2008.4497486"&gt;Sideways Information 
Passing&lt;/a&gt; in Database research
 literature. The initial implementation passed min/max values for the join keys.
-DataFusion 52 extends the optimization (&lt;a 
href="https://github.com/apache/datafusion/issues/17171"&gt;#17171&lt;/a&gt; / 
&lt;a 
href="https://github.com/apache/datafusion/pull/18393"&gt;#18393&lt;/a&gt;) to 
use an &lt;code&gt;IN&lt;/code&gt; list when the
-build size is small such as when the join is very selective or a reference to 
the build side hash map when the build side is larger.
-These new expressions are pushed down to the probe side scan and is used to 
prune files, row groups, and
-individual rows.
-When the build side is small enough (&amp;lt;=20 rows but configurable) the 
pushed down filters can even participate in statistics pruning to avoid even 
reading the join keys from row groups that will not match.&lt;/p&gt;
-&lt;p&gt;Thanks to &lt;a 
href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt; for implementing this 
feature, with
-reviews from &lt;a 
href="https://github.com/LiaCastaneda"&gt;LiaCastaneda&lt;/a&gt;, &lt;a 
href="https://github.com/asolimando"&gt;asolimando&lt;/a&gt;, &lt;a 
href="https://github.com/comphead"&gt;comphead&lt;/a&gt;, and &lt;a 
href="https://github.com/mbutrovich"&gt;mbutrovich&lt;/a&gt;.&lt;/p&gt;
+DataFusion 52 extends the optimization (&lt;a 
href="https://github.com/apache/datafusion/issues/17171"&gt;#17171&lt;/a&gt; / 
&lt;a 
href="https://github.com/apache/datafusion/pull/18393"&gt;#18393&lt;/a&gt;) to 
pass the
+contents of the build side hash map. These filters are evaluated on the probe
+side scan to prune files, row groups, and individual rows. When the build side
+contains &lt;code&gt;20&lt;/code&gt; or fewer rows (configurable) the contents 
of the hash map are
+transformed to an &lt;code&gt;IN&lt;/code&gt; expression and used for &lt;a 
href="https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html"&gt;statistics-based
 pruning&lt;/a&gt; which
+can avoid reading entire files or row groups that contain no matching join 
keys.
+Thanks to &lt;a href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt; for 
implementing this feature, with reviews from
+&lt;a href="https://github.com/LiaCastaneda"&gt;LiaCastaneda&lt;/a&gt;, &lt;a 
href="https://github.com/asolimando"&gt;asolimando&lt;/a&gt;, &lt;a 
href="https://github.com/comphead"&gt;comphead&lt;/a&gt;, and &lt;a 
href="https://github.com/mbutrovich"&gt;mbutrovich&lt;/a&gt;.&lt;/p&gt;
 &lt;h2 id="major-features"&gt;Major Features ✨&lt;a class="headerlink" 
href="#major-features" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
 &lt;h3 id="arrow-ipc-stream-file-support"&gt;Arrow IPC Stream file 
support&lt;a class="headerlink" href="#arrow-ipc-stream-file-support" 
title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
 &lt;p&gt;DataFusion can now read Arrow IPC stream files (&lt;a 
href="https://github.com/apache/datafusion/pull/18457"&gt;#18457&lt;/a&gt;). 
This expands


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion-site) branch asf-staging updated: Commit build products

Reply via email to