This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 855af91  Commit build products
855af91 is described below

commit 855af9154aea006f3700752f4e190e4c4c00fdb0
Author: Build Pelican (action) <[email protected]>
AuthorDate: Wed Mar 26 13:13:39 2025 +0000

    Commit build products
---
 output/2025/03/24/datafusion-46.0.0/index.html     | 147 +++++++++++++++++++++
 ...anci-and-berkay-sahin-on-behalf-of-the-pmc.html | 104 +++++++++++++++
 output/category/blog.html                          |  35 +++++
 output/feed.xml                                    |  18 ++-
 output/feeds/all-en.atom.xml                       | 104 ++++++++++++++-
 output/feeds/blog.atom.xml                         | 104 ++++++++++++++-
 ...-and-berkay-sahin-on-behalf-of-the-pmc.atom.xml | 104 +++++++++++++++
 ...i-and-berkay-sahin-on-behalf-of-the-pmc.rss.xml |  18 +++
 .../datafusion-46.0.0/diagnostic-example.png       | Bin 0 -> 140405 bytes
 output/index.html                                  |  35 +++++
 10 files changed, 666 insertions(+), 3 deletions(-)

diff --git a/output/2025/03/24/datafusion-46.0.0/index.html 
b/output/2025/03/24/datafusion-46.0.0/index.html
new file mode 100644
index 0000000..8e11f06
--- /dev/null
+++ b/output/2025/03/24/datafusion-46.0.0/index.html
@@ -0,0 +1,147 @@
+<!doctype html>
+<html class="no-js" lang="en" dir="ltr">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="x-ua-compatible" content="ie=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Apache DataFusion 46.0.0 Released - Apache DataFusion Blog</title>
+<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
+<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
+<link href="/blog/css/headerlink.css" rel="stylesheet">
+<link href="/blog/highlight/default.min.css" rel="stylesheet">
+<script src="/blog/highlight/highlight.js"></script>
+<script>hljs.highlightAll();</script>  </head>
+  <body class="d-flex flex-column h-100">
+  <main class="flex-shrink-0">
+<!-- nav bar -->
+<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth 
navbar example">
+    <div class="container-fluid">
+        <a class="navbar-brand" href="/blog"><img 
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache 
DataFusion Blog</a>
+        <button class="navbar-toggler" type="button" data-bs-toggle="collapse" 
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false" 
aria-label="Toggle navigation">
+            <span class="navbar-toggler-icon"></span>
+        </button>
+
+        <div class="collapse navbar-collapse" id="navbarADP">
+            <ul class="navbar-nav me-auto mb-2 mb-lg-0">
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/about.html">About</a>
+                </li>
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/feed.xml">RSS</a>
+                </li>
+            </ul>
+        </div>
+    </div>
+</nav>    
+
+
+<!-- page contents -->
+<div id="contents">
+    <div class="bg-white p-5 rounded">
+        <div class="col-sm-8 mx-auto">
+          <h1>
+              Apache DataFusion 46.0.0 Released
+          </h1>
+              <p>Posted on: Mon 24 March 2025 by Oznur Hanci and Berkay Sahin 
on behalf of the PMC</p>
+              <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We&rsquo;re excited to announce the release of&nbsp;<strong>Apache 
DataFusion 46.0.0</strong>! This new version represents a significant milestone 
for the project, packing in a wide range of improvements and fixes. You can 
find the complete details in the&nbsp;full <a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md";>changelog</a>.
 We&rsquo;ll highlight the most important changes below and guide you through 
upgrading.</p>
+<h2>Breaking Changes</h2>
+<p>DataFusion 46.0.0 brings a few&nbsp;<strong>breaking 
changes</strong>&nbsp;that may require adjustments to your code as described in 
the <a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html";>Upgrade 
Guide</a>. Here are the most notable ones:</p>
+<ul>
+<li><a href="https://github.com/apache/datafusion/pull/14224#";>Unified 
<code>DataSourceExec</code> Execution 
Plan</a><strong>:</strong>&nbsp;DataFusion 46.0.0 introduces a major refactor 
of scan operators. The separate file-format-specific execution plan nodes 
(<code>ParquetExec</code>,&nbsp;<code>CsvExec</code>,&nbsp;<code>JsonExec</code>,&nbsp;<code>AvroExec</code>,
 etc.) have been&nbsp;<strong>deprecated and merged into a single 
<code>DataSourceExec</code>&nbsp;plan</strong>. Format-s [...]
+<li><a 
href="https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2";>**Error 
Handling Improvements</a> (<code>DataFusionError::Collection</code>):**&nbsp;We 
began overhauling DataFusion&rsquo;s approach to error handling. In this 
release, a new error 
variant&nbsp;<code>DataFusionError::Collection</code>&nbsp;(and related 
mechanisms) has been introduced to aggregate multiple errors into one. This is 
part of a broader effort to provide richer error context and reduce internal 
pan [...]
+</ul>
+<h2>Performance Improvements</h2>
+<p>DataFusion 46.0.0 comes with a slew of performance enhancements across the 
board. Here are some of the noteworthy optimizations in this release:</p>
+<ul>
+<li><strong>Faster&nbsp;<code>median()</code>&nbsp;(no 
grouping):</strong>&nbsp;The&nbsp;<code>median()</code>&nbsp;aggregate function 
got a special fast path when used without a&nbsp;<code>GROUP BY</code>. By 
optimizing its accumulator, median calculation is about&nbsp;<strong>2&times; 
faster</strong>&nbsp;in the single-group case. If you 
use&nbsp;<code>MEDIAN()</code>&nbsp;on large datasets (especially as a single 
value), you should notice reduced query times (PR <a href="https://githu [...]
+<li><strong>Optimized&nbsp;<code>FIRST_VALUE</code>/<code>LAST_VALUE</code>:</strong>&nbsp;The&nbsp;<code>FIRST_VALUE</code>&nbsp;and&nbsp;<code>LAST_VALUE</code>&nbsp;window
 functions have been improved by avoiding an internal sort of rows. Instead of 
sorting each partition, the implementation now uses a direct approach to pick 
the first/last element. This yields&nbsp;<strong>10&ndash;100% performance 
improvement</strong>&nbsp;for these functions, depending on the scenario. 
Queries usin [...]
+<li><strong><code>repeat()</code>&nbsp;String Function 
Boost:</strong>&nbsp;Repeating strings is now more efficient &ndash; 
the&nbsp;<code>repeat(text, n)</code>&nbsp;function was optimized by 
about&nbsp;<strong>50%</strong>. This was achieved by reducing allocations and 
using a more efficient concatenation strategy. If you generate large repeated 
strings in queries, this can cut the time nearly in half (PR <a 
href="https://github.com/apache/datafusion/pull/14697";>#14697</a> by <a href=" 
[...]
+<li><strong>Ultra-fast&nbsp;<code>uuid()</code>&nbsp;UDF:</strong>&nbsp;The&nbsp;<code>uuid()</code>&nbsp;function
 (which generates random UUID strings) received a major speed-up. It&rsquo;s 
now roughly&nbsp;<strong>40&times; faster</strong>&nbsp;than before! The new 
implementation avoids unnecessary string copying and uses a more direct 
conversion to hex, making bulk UUID generation far more practical (PR <a 
href="https://github.com/apache/datafusion/pull/14675";>#14675</a> by <a href="h 
[...]
+<li><strong>Accelerated&nbsp;<code>chr()</code>&nbsp;and&nbsp;<code>to_hex()</code>:</strong>&nbsp;Several
 scalar functions have been micro-optimized. 
The&nbsp;<code>chr()</code>&nbsp;function (which returns the character for a 
given ASCII code) is about&nbsp;<strong>4&times; faster</strong>&nbsp;now, and 
the&nbsp;<code>to_hex()</code>&nbsp;function (which converts numbers to hex 
string) is roughly&nbsp;<strong>2&times; faster</strong>. These improvements 
may be most noticeable in tight  [...]
+<li><strong>No More RowConverter in Grouped Ordering:</strong>&nbsp;We removed 
an inefficient step in the&nbsp;<em>partial grouping</em>&nbsp;algorithm. 
The&nbsp;<code>GroupOrderingPartial</code>&nbsp;operator no longer converts 
data to &ldquo;row format&rdquo; for each batch 
(via&nbsp;<code>RowConverter</code>). Instead, it uses a direct arrow-based 
approach to detect sort key changes. This eliminated overhead and yields a nice 
speedup for certain aggregation queries. (PR <a href="https [...]
+<li><strong>Predicate Pruning for&nbsp;<code>NOT 
LIKE</code>:</strong>&nbsp;DataFusion&rsquo;s parquet reader can now prune row 
groups using&nbsp;<code>NOT LIKE</code>&nbsp;filters, similar to how it 
handles&nbsp;<code>LIKE</code>. This means if you have a filter such 
as&nbsp;<code>column NOT LIKE 'prefix%'</code>, DataFusion can use min/max 
statistics to skip reading files/parts that can be determined to either 
entirely match or not match the predicate. In particular, a pattern like&nbs 
[...]
+</ul>
+<h2>Google Summer of Code 2025</h2>
+<p>Another exciting development:&nbsp;<strong>Apache DataFusion has been 
accepted as a mentoring organization for Google Summer of Code (GSoC) 
2025</strong>! 🎉 This means that this summer, students from around the world 
will have the opportunity to contribute to DataFusion under the guidance of our 
committers. We have put together <a 
href="https://datafusion.apache.org/contributor-guide/gsoc_project_ideas.html";>a
 list of project ideas</a> that candidates can choose from.</p>
+<p>If you&rsquo;re interested, check out our&nbsp;<a 
href="https://datafusion.apache.org/contributor-guide/gsoc_application_guidelines.html";>GSoC
 Application Guidelines</a>. We encourage students to reach out, discuss ideas 
with us, and apply.</p>
+<h2>Highlighted New Features</h2>
+<h3>Improved Diagnostics</h3>
+<p>DataFusion 46.0.0 introduces a new&nbsp;<a 
href="https://github.com/apache/datafusion/issues/14429";><strong>SQL 
Diagnostics framework</strong></a>&nbsp;to make error messages more 
understandable. This comes in the form of 
new&nbsp;<code>Diagnostic</code>&nbsp;and&nbsp;<code>DiagnosticEntry</code>&nbsp;types,
 which allow the system to attach rich context (like source query text spans) 
to error messages. In practical terms, certain planner errors will now point to 
the exact location in  [...]
+<p>For example, if you reference an unknown table or miss a column in 
<code>GROUP BY</code> the error message will include the query snippet causing 
the error. These diagnostics are meant for end-users of applications built on 
DataFusion, providing clearer messages instead of generic errors. Here&rsquo;s 
an example:</p>
+<p><img alt="diagnostic-example" class="img-responsive" 
src="/blog/images/datafusion-46.0.0/diagnostic-example.png" width="80%"/></p>
+<p>Currently, diagnostics cover unresolved table/column references, missing 
<code>GROUP BY</code> columns, ambiguous references, wrong number of UNION 
columns, type mismatches, and a few others. Future releases will extend this to 
more error types. This feature should greatly ease debugging of complex SQL by 
pinpointing errors directly in the query text. We thank <a 
href="https://github.com/eliaperantoni";>@eliaperantoni</a> for his 
contributions in this project.</p>
+<h3>Unified&nbsp;<code>DataSourceExec</code>&nbsp;for Table Providers</h3>
+<p>As mentioned, DataFusion now uses a 
unified&nbsp;<code>DataSourceExec</code>&nbsp;for reading tables, which is both 
a breaking change and a feature.&nbsp;<em>Why is this important?</em>&nbsp;The 
new approach simplifies how custom table providers are integrated and 
optimized. Namely, the optimizer can treat file scans uniformly and push down 
filters/limits more consistently when there is one execution plan that handles 
all data sources. The new&nbsp;<code>DataSourceExec</code>&nbsp;is  [...]
+<p>All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have been 
migrated to this framework. This unification makes the codebase cleaner and 
sets the stage for future enhancements (like consistent metadata handling and 
limit pushdown across all formats). Check out PR <a 
href="https://github.com/apache/datafusion/pull/14224";>#14224</a> for design 
details. We thank <a 
href="https://github.com/mertak-synnada";>@mertak-synnada</a> and <a 
href="https://github.com/ozankabak";>@ozankabak [...]
+<h3>FFI Support for Scalar UDFs</h3>
+<p>DataFusion&rsquo;s Foreign Function Interface (FFI) has been extended to 
support&nbsp;<a 
href="https://github.com/apache/datafusion/pull/14579";><strong>user-defined 
scalar functions</strong></a>&nbsp;defined in external languages. In 46.0.0, 
you can now expose a custom scalar UDF through the FFI layer and use it in 
DataFusion as if it were built-in. This is particularly exciting for the 
<strong>Python bindings</strong> and other language integrations &ndash; it 
means you could define  [...]
+<h3>New Statistics/Distribution Framework</h3>
+<p>This release, thanks mainly to <a 
href="https://github.com/Fly-Style";>@Fly-Style</a> with contributions from <a 
href="https://github.com/ozankabak";>@ozankabak</a> and <a 
href="https://github.com/berkaysynnada";>@berkaysynnada</a>, includes the 
initial pieces of a&nbsp;<a 
href="https://github.com/apache/datafusion/pull/14699";>**redesigned statistics 
framework</a>.<strong> DataFusion&rsquo;s optimizer can now represent column 
data distributions using a new&nbsp;<code>Distribution</code>& [...]
+<p>For example, if a filter expression is applied to a column with a known 
uniform distribution range, the optimizer can propagate that to estimate result 
selectivity more accurately. Similarly, comparisons (<code>=</code>, 
<code>&gt;</code>, etc.) on columns yield Bernoulli distributions (with 
true/false probabilities) in this model.</p>
+<p>This is a foundational change with many follow-on PRs underway. Even though 
the immediate user-visible effect is limited (the optimizer didn't magically 
improve by an order of magnitude overnight), but it lays groundwork for more 
advanced query planning in the future. Over time, as statistics information 
encapsulated in <code>Distribution</code>s get integrated, DataFusion will be 
able to make smarter decisions like more aggressive parquet pruning, better 
join orderings, and so on bas [...]
+<h3>Aggregate Monotonicity and Window Ordering</h3>
+<p>DataFusion 46.0.0 adds a new concept of <a 
href="https://github.com/apache/datafusion/pull/14271";><strong>set-monotonicity</strong></a>
 for certain transformations, which helps avoid unnecessary sort operations. In 
particular, the planner now understands when a <strong>window function 
introduces new orderings of data</strong>.</p>
+<p>For example, DataFusion now recognizes that a window-aggregate like 
<code>MAX</code> on a column can produce a result that is <strong>monotonically 
increasing</strong>, even if the input column is unordered &mdash; depending on 
the window frame used.</p>
+<p>Consider the following query:</p>
+<div class="codehilite"><pre><span></span><code><span class="k">SELECT</span> 
<span class="k">MAX</span><span class="p">(</span><span 
class="n">c1</span><span class="p">)</span> <span class="n">OVER</span> <span 
class="p">(</span>
+    <span class="k">ROWS</span> <span class="k">BETWEEN</span> <span 
class="n">UNBOUNDED</span> <span class="n">PRECEDING</span> <span 
class="k">AND</span> <span class="k">CURRENT</span> <span class="k">ROW</span>
+<span class="p">)</span> <span class="k">AS</span> <span 
class="n">max_c1</span>
+<span class="k">FROM</span> <span class="n">c1_table</span>
+<span class="k">ORDER</span> <span class="k">BY</span> <span 
class="n">max_c1</span><span class="p">;</span>
+</code></pre></div>
+<p>In earlier versions of DataFusion, this query would require an additional 
SortExec on max_c1 to satisfy the ORDER BY clause. However, with the new 
set-monotonicity logic, the planner knows that MAX(...) OVER (...) produces 
values that are not smaller than the previous row, making the extra sort 
redundant. This leads to more efficient query execution.</p>
+<p>PR <a href="https://github.com/apache/datafusion/pull/14271";>#14271</a> 
introduced the core monotonicity tracking for aggregates and window functions.
+PR <a href="https://github.com/apache/datafusion/pull/14813";>#14813</a> 
improved ordering preservation within various window frame types, and brought 
an extensive test coverage.
+Huge thanks to <a href="https://github.com/berkaysynnada";>@berkaysynnada</a> 
and <a href="https://github.com/mertak-synnada";>@mertak-synnada</a> for 
designing and implementing this optimizer enhancement!</p>
+<h3>UNION [ALL | DISTINCT] BY NAME Support</h3>
+<p>DataFusion now supports UNION BY NAME and UNION ALL BY NAME, which align 
columns by name instead of position. This matches functionality found in 
systems like Spark and DuckDB and simplifies combining heterogeneously ordered 
result sets.</p>
+<p>You no longer need to rewrite column order manually &mdash; just write:</p>
+<div class="codehilite"><pre><span></span><code><span class="k">SELECT</span> 
<span class="n">col1</span><span class="p">,</span> <span class="n">col2</span> 
<span class="k">FROM</span> <span class="n">t1</span>
+<span class="k">UNION</span> <span class="k">ALL</span> <span 
class="k">BY</span> <span class="n">NAME</span>
+<span class="k">SELECT</span> <span class="n">col2</span><span 
class="p">,</span> <span class="n">col1</span> <span class="k">FROM</span> 
<span class="n">t2</span><span class="p">;</span>
+</code></pre></div>
+<p>Under the hood, this is supported by the new union_by_name() and 
union_by_name_distinct() plan builder methods.</p>
+<p>Thanks to <a href="https://github.com/rkrishn7";>@rkrishn7</a> for PR <a 
href="https://github.com/apache/datafusion/pull/14538";>#14538</a>.</p>
+<h3>New range() Table Function</h3>
+<p>A new table-valued function range(start, stop, step) has been added to make 
it easy to generate integer sequences &mdash; similar to PostgreSQL&rsquo;s 
generate_series() or Spark&rsquo;s range().</p>
+<p>Example:</p>
+<div class="codehilite"><pre><span></span><code><span class="k">SELECT</span> 
<span class="o">*</span> <span class="k">FROM</span> <span 
class="n">range</span><span class="p">(</span><span class="mi">1</span><span 
class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span 
class="mi">2</span><span class="p">);</span>
+</code></pre></div>
+<p>This returns: 1, 3, 5, 7, 9. It&rsquo;s great for testing, cross joins, 
surrogate keys, and more.</p>
+<p>Thanks to <a href="https://github.com/simonvandel";>@simonvandel</a> for PR 
<a href="https://github.com/apache/datafusion/pull/14830";>#14830</a>.</p>
+<h2>Upgrade Guide and Changelog</h2>
+<p>Upgrading to 46.0.0 should be straightforward for most users, but do review 
the&nbsp;<a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html";>Upgrade 
Guide for DataFusion 46.0.0</a>&nbsp;for detailed steps and code changes. The 
upgrade guide covers the breaking changes mentioned (like replacing old exec 
nodes with&nbsp;<code>DataSourceExec</code>, updating UDF invocation 
to&nbsp;<code>invoke_with_args</code>, etc.) and provides code snippets to help 
with the transitio [...]
+<h2>Get Involved</h2>
+<p>Apache DataFusion is an open-source project, and we welcome involvement 
from anyone interested. Now is a great time to take 46.0.0 for a spin: try it 
out on your workloads, and let us know if you encounter any issues or have 
suggestions. You can report bugs or request features on our&nbsp;GitHub issue 
tracker, or better yet, submit a pull request. Join our community discussions 
&ndash; whether you have questions, want to share how you&rsquo;re using 
DataFusion, or are looking to contr [...]
+<p>Happy querying!</p>
+        </div>
+      </div>
+    </div>    
+    <!-- footer -->
+    <div class="row">
+      <div class="large-12 medium-12 columns">
+        <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
+          Copyright 2025, <a href="https://www.apache.org/";>The Apache 
Software Foundation</a>, Licensed under the <a 
href="https://www.apache.org/licenses/LICENSE-2.0";>Apache License, Version 
2.0</a>.<br/>
+          Apache&reg; and the Apache feather logo are trademarks of The Apache 
Software Foundation.
+        </p>
+      </div>
+    </div>
+    <script src="/blog/js/bootstrap.bundle.min.js"></script>  </main>
+  </body>
+</html>
diff --git 
a/output/author/oznur-hanci-and-berkay-sahin-on-behalf-of-the-pmc.html 
b/output/author/oznur-hanci-and-berkay-sahin-on-behalf-of-the-pmc.html
new file mode 100644
index 0000000..c6c7e3a
--- /dev/null
+++ b/output/author/oznur-hanci-and-berkay-sahin-on-behalf-of-the-pmc.html
@@ -0,0 +1,104 @@
+    <!doctype html>
+    <html class="no-js" lang="en" dir="ltr">
+    <head>
+        <meta charset="utf-8">
+        <meta http-equiv="x-ua-compatible" content="ie=edge">
+        <meta name="viewport" content="width=device-width, initial-scale=1.0">
+        <title>Apache DataFusion Blog</title>
+<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
+<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
+<link href="/blog/css/headerlink.css" rel="stylesheet">
+<link href="/blog/highlight/default.min.css" rel="stylesheet">
+<script src="/blog/highlight/highlight.js"></script>
+<script>hljs.highlightAll();</script>        <link 
href="/blog/css/blog_index.css" rel="stylesheet">
+    </head>
+    <body class="d-flex flex-column h-100">
+    <main class="flex-shrink-0">
+        <div>
+
+<!-- nav bar -->
+<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth 
navbar example">
+    <div class="container-fluid">
+        <a class="navbar-brand" href="/blog"><img 
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache 
DataFusion Blog</a>
+        <button class="navbar-toggler" type="button" data-bs-toggle="collapse" 
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false" 
aria-label="Toggle navigation">
+            <span class="navbar-toggler-icon"></span>
+        </button>
+
+        <div class="collapse navbar-collapse" id="navbarADP">
+            <ul class="navbar-nav me-auto mb-2 mb-lg-0">
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/about.html">About</a>
+                </li>
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/feed.xml">RSS</a>
+                </li>
+            </ul>
+        </div>
+    </div>
+</nav>
+            <div id="contents">
+                <div class="bg-white p-5 rounded">
+                    <div class="col-sm-8 mx-auto">
+<div id="contents">
+    <div class="bg-white p-5 rounded">
+        <div class="col-sm-8 mx-auto">
+
+            <h3>Welcome to the Apache DataFusion Blog!</h3>
+            <p><i>Here you can find the latest updates from DataFusion and 
related projects.</i></p>
+
+
+    <!-- Post -->
+    <div class="row">
+        <div class="callout">
+            <article class="post">
+                <header>
+                    <div class="title">
+                        <h1><a 
href="/blog/2025/03/24/datafusion-46.0.0">Apache DataFusion 46.0.0 
Released</a></h1>
+                        <p>Posted on: Mon 24 March 2025 by Oznur Hanci and 
Berkay Sahin on behalf of the PMC</p>
+                        <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We&rsquo;re excited to announce the release of&nbsp;<strong>Apache 
DataFusion 46.0.0</strong>! This new version represents a significant milestone 
for the project, packing in a wide range of improvements and fixes. You can 
find the complete details in the&nbsp;full <a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md";>changelog</a>.
 We&rsquo;ll highlight the most important changes below …</p></p>
+                        <footer>
+                            <ul class="actions">
+                                <div style="text-align: right"><a 
href="/blog/2025/03/24/datafusion-46.0.0" class="button medium">Continue 
Reading</a></div>
+                            </ul>
+                            <ul class="stats">
+                            </ul>
+                        </footer>
+            </article>
+        </div>
+    </div>
+
+        </div>
+    </div>
+</div>                    </div>
+                </div>
+            </div>
+
+    <!-- footer -->
+    <div class="row">
+      <div class="large-12 medium-12 columns">
+        <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
+          Copyright 2025, <a href="https://www.apache.org/";>The Apache 
Software Foundation</a>, Licensed under the <a 
href="https://www.apache.org/licenses/LICENSE-2.0";>Apache License, Version 
2.0</a>.<br/>
+          Apache&reg; and the Apache feather logo are trademarks of The Apache 
Software Foundation.
+        </p>
+      </div>
+    </div>
+    <script src="/blog/js/bootstrap.bundle.min.js"></script>        </div>
+    </main>
+    </body>
+    </html>
diff --git a/output/category/blog.html b/output/category/blog.html
index 33393cc..e59878e 100644
--- a/output/category/blog.html
+++ b/output/category/blog.html
@@ -47,6 +47,41 @@
             <p><i>Here you can find the latest updates from DataFusion and 
related projects.</i></p>
 
 
+    <!-- Post -->
+    <div class="row">
+        <div class="callout">
+            <article class="post">
+                <header>
+                    <div class="title">
+                        <h1><a 
href="/blog/2025/03/24/datafusion-46.0.0">Apache DataFusion 46.0.0 
Released</a></h1>
+                        <p>Posted on: Mon 24 March 2025 by Oznur Hanci and 
Berkay Sahin on behalf of the PMC</p>
+                        <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We&rsquo;re excited to announce the release of&nbsp;<strong>Apache 
DataFusion 46.0.0</strong>! This new version represents a significant milestone 
for the project, packing in a wide range of improvements and fixes. You can 
find the complete details in the&nbsp;full <a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md";>changelog</a>.
 We&rsquo;ll highlight the most important changes below …</p></p>
+                        <footer>
+                            <ul class="actions">
+                                <div style="text-align: right"><a 
href="/blog/2025/03/24/datafusion-46.0.0" class="button medium">Continue 
Reading</a></div>
+                            </ul>
+                            <ul class="stats">
+                            </ul>
+                        </footer>
+            </article>
+        </div>
+    </div>
     <!-- Post -->
     <div class="row">
         <div class="callout">
diff --git a/output/feed.xml b/output/feed.xml
index fcdbe66..8a126ff 100644
--- a/output/feed.xml
+++ b/output/feed.xml
@@ -1,5 +1,21 @@
 <?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion 
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Fri,
 21 Mar 2025 00:00:00 +0000</lastBuildDate><item><title>Efficient Filter 
Pushdown in 
Parquet</title><link>https://datafusion.apache.org/blog/2025/03/21/parquet-pushdown</link><description>&lt;style&gt;
+<rss version="2.0"><channel><title>Apache DataFusion 
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
 24 Mar 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion 
46.0.0 
Released</title><link>https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0</link><description>&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;p&gt;We&amp;rsquo;re excited to announce the release 
of&amp;nbsp;&lt;strong&gt;Apache DataFusion 46.0.0&lt;/strong&gt;! This new 
version represents a significant milestone for the project, packing in a wide 
range of improvements and fixes. You can find the complete details in 
the&amp;nbsp;full &lt;a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md"&gt;changelog&lt;/a&gt;.
 We&amp;rsquo;ll highlight the most important changes below 
…&lt;/p&gt;</descript [...]
 figure {
   margin: 20px 0;
 }
diff --git a/output/feeds/all-en.atom.xml b/output/feeds/all-en.atom.xml
index 450d535..effa358 100644
--- a/output/feeds/all-en.atom.xml
+++ b/output/feeds/all-en.atom.xml
@@ -1,5 +1,107 @@
 <?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion 
Blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-03-21T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Efficient
 Filter Pushdown in Parquet</title><link 
href="https://datafusion.apache.org/blog/2025/03/21/parquet-pushdown"; 
rel="alter [...]
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion 
Blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-03-24T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion 46.0.0 Released</title><link 
href="https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0"; 
rel="alterna [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;p&gt;We&amp;rsquo;re excited to announce the release 
of&amp;nbsp;&lt;strong&gt;Apache DataFusion 46.0.0&lt;/strong&gt;! This new 
version represents a significant milestone for the project, packing in a wide 
range of improvements and fixes. You can find the complete details in 
the&amp;nbsp;full &lt;a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md"&gt;changelog&lt;/a&gt;.
 We&amp;rsquo;ll highlight the most important changes below 
…&lt;/p&gt;</summary> [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;p&gt;We&amp;rsquo;re excited to announce the release 
of&amp;nbsp;&lt;strong&gt;Apache DataFusion 46.0.0&lt;/strong&gt;! This new 
version represents a significant milestone for the project, packing in a wide 
range of improvements and fixes. You can find the complete details in 
the&amp;nbsp;full &lt;a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md"&gt;changelog&lt;/a&gt;.
 We&amp;rsquo;ll highlight the most important changes below and guide you 
through [...]
+&lt;h2&gt;Breaking Changes&lt;/h2&gt;
+&lt;p&gt;DataFusion 46.0.0 brings a few&amp;nbsp;&lt;strong&gt;breaking 
changes&lt;/strong&gt;&amp;nbsp;that may require adjustments to your code as 
described in the &lt;a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade
 Guide&lt;/a&gt;. Here are the most notable ones:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;&lt;a 
href="https://github.com/apache/datafusion/pull/14224#"&gt;Unified 
&lt;code&gt;DataSourceExec&lt;/code&gt; Execution 
Plan&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt;&amp;nbsp;DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes 
(&lt;code&gt;ParquetExec&lt;/code&gt;,&amp;nbsp;&lt;code&gt;CsvExec&lt;/code&gt;,&amp;nbsp;&lt;code&gt;JsonExec&lt;/code&gt;,&amp;nbsp;&lt;code&gt;AvroExec&lt;/code&gt;,
 etc.) have been [...]
+&lt;li&gt;&lt;a 
href="https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2"&gt;**Error
 Handling Improvements&lt;/a&gt; 
(&lt;code&gt;DataFusionError::Collection&lt;/code&gt;):**&amp;nbsp;We began 
overhauling DataFusion&amp;rsquo;s approach to error handling. In this release, 
a new error 
variant&amp;nbsp;&lt;code&gt;DataFusionError::Collection&lt;/code&gt;&amp;nbsp;(and
 related mechanisms) has been introduced to aggregate multiple errors into one. 
This is part of a broader effo [...]
+&lt;/ul&gt;
+&lt;h2&gt;Performance Improvements&lt;/h2&gt;
+&lt;p&gt;DataFusion 46.0.0 comes with a slew of performance enhancements 
across the board. Here are some of the noteworthy optimizations in this 
release:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;&lt;strong&gt;Faster&amp;nbsp;&lt;code&gt;median()&lt;/code&gt;&amp;nbsp;(no
 
grouping):&lt;/strong&gt;&amp;nbsp;The&amp;nbsp;&lt;code&gt;median()&lt;/code&gt;&amp;nbsp;aggregate
 function got a special fast path when used without 
a&amp;nbsp;&lt;code&gt;GROUP BY&lt;/code&gt;. By optimizing its accumulator, 
median calculation is about&amp;nbsp;&lt;strong&gt;2&amp;times; 
faster&lt;/strong&gt;&amp;nbsp;in the single-group case. If you 
use&amp;nbsp;&lt;code&gt;MEDIAN()&lt;/code&gt;&a [...]
+&lt;li&gt;&lt;strong&gt;Optimized&amp;nbsp;&lt;code&gt;FIRST_VALUE&lt;/code&gt;/&lt;code&gt;LAST_VALUE&lt;/code&gt;:&lt;/strong&gt;&amp;nbsp;The&amp;nbsp;&lt;code&gt;FIRST_VALUE&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;LAST_VALUE&lt;/code&gt;&amp;nbsp;window
 functions have been improved by avoiding an internal sort of rows. Instead of 
sorting each partition, the implementation now uses a direct approach to pick 
the first/last element. This yields&amp;nbsp;&lt;strong&gt;10&amp;ndash [...]
+&lt;li&gt;&lt;strong&gt;&lt;code&gt;repeat()&lt;/code&gt;&amp;nbsp;String 
Function Boost:&lt;/strong&gt;&amp;nbsp;Repeating strings is now more efficient 
&amp;ndash; the&amp;nbsp;&lt;code&gt;repeat(text, 
n)&lt;/code&gt;&amp;nbsp;function was optimized by 
about&amp;nbsp;&lt;strong&gt;50%&lt;/strong&gt;. This was achieved by reducing 
allocations and using a more efficient concatenation strategy. If you generate 
large repeated strings in queries, this can cut the time nearly in half (PR &lt 
[...]
+&lt;li&gt;&lt;strong&gt;Ultra-fast&amp;nbsp;&lt;code&gt;uuid()&lt;/code&gt;&amp;nbsp;UDF:&lt;/strong&gt;&amp;nbsp;The&amp;nbsp;&lt;code&gt;uuid()&lt;/code&gt;&amp;nbsp;function
 (which generates random UUID strings) received a major speed-up. 
It&amp;rsquo;s now roughly&amp;nbsp;&lt;strong&gt;40&amp;times; 
faster&lt;/strong&gt;&amp;nbsp;than before! The new implementation avoids 
unnecessary string copying and uses a more direct conversion to hex, making 
bulk UUID generation far more practi [...]
+&lt;li&gt;&lt;strong&gt;Accelerated&amp;nbsp;&lt;code&gt;chr()&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;to_hex()&lt;/code&gt;:&lt;/strong&gt;&amp;nbsp;Several
 scalar functions have been micro-optimized. 
The&amp;nbsp;&lt;code&gt;chr()&lt;/code&gt;&amp;nbsp;function (which returns 
the character for a given ASCII code) is 
about&amp;nbsp;&lt;strong&gt;4&amp;times; faster&lt;/strong&gt;&amp;nbsp;now, 
and the&amp;nbsp;&lt;code&gt;to_hex()&lt;/code&gt;&amp;nbsp;function (which 
converts nu [...]
+&lt;li&gt;&lt;strong&gt;No More RowConverter in Grouped 
Ordering:&lt;/strong&gt;&amp;nbsp;We removed an inefficient step in 
the&amp;nbsp;&lt;em&gt;partial grouping&lt;/em&gt;&amp;nbsp;algorithm. 
The&amp;nbsp;&lt;code&gt;GroupOrderingPartial&lt;/code&gt;&amp;nbsp;operator no 
longer converts data to &amp;ldquo;row format&amp;rdquo; for each batch 
(via&amp;nbsp;&lt;code&gt;RowConverter&lt;/code&gt;). Instead, it uses a direct 
arrow-based approach to detect sort key changes. This eliminated  [...]
+&lt;li&gt;&lt;strong&gt;Predicate Pruning for&amp;nbsp;&lt;code&gt;NOT 
LIKE&lt;/code&gt;:&lt;/strong&gt;&amp;nbsp;DataFusion&amp;rsquo;s parquet 
reader can now prune row groups using&amp;nbsp;&lt;code&gt;NOT 
LIKE&lt;/code&gt;&amp;nbsp;filters, similar to how it 
handles&amp;nbsp;&lt;code&gt;LIKE&lt;/code&gt;. This means if you have a filter 
such as&amp;nbsp;&lt;code&gt;column NOT LIKE 'prefix%'&lt;/code&gt;, DataFusion 
can use min/max statistics to skip reading files/parts that can be det [...]
+&lt;/ul&gt;
+&lt;h2&gt;Google Summer of Code 2025&lt;/h2&gt;
+&lt;p&gt;Another exciting development:&amp;nbsp;&lt;strong&gt;Apache 
DataFusion has been accepted as a mentoring organization for Google Summer of 
Code (GSoC) 2025&lt;/strong&gt;! 🎉 This means that this summer, students from 
around the world will have the opportunity to contribute to DataFusion under 
the guidance of our committers. We have put together &lt;a 
href="https://datafusion.apache.org/contributor-guide/gsoc_project_ideas.html"&gt;a
 list of project ideas&lt;/a&gt; that candidates [...]
+&lt;p&gt;If you&amp;rsquo;re interested, check out our&amp;nbsp;&lt;a 
href="https://datafusion.apache.org/contributor-guide/gsoc_application_guidelines.html"&gt;GSoC
 Application Guidelines&lt;/a&gt;. We encourage students to reach out, discuss 
ideas with us, and apply.&lt;/p&gt;
+&lt;h2&gt;Highlighted New Features&lt;/h2&gt;
+&lt;h3&gt;Improved Diagnostics&lt;/h3&gt;
+&lt;p&gt;DataFusion 46.0.0 introduces a new&amp;nbsp;&lt;a 
href="https://github.com/apache/datafusion/issues/14429"&gt;&lt;strong&gt;SQL 
Diagnostics framework&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;to make error messages 
more understandable. This comes in the form of 
new&amp;nbsp;&lt;code&gt;Diagnostic&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;DiagnosticEntry&lt;/code&gt;&amp;nbsp;types,
 which allow the system to attach rich context (like source query text spans) 
to error messages. In pr [...]
+&lt;p&gt;For example, if you reference an unknown table or miss a column in 
&lt;code&gt;GROUP BY&lt;/code&gt; the error message will include the query 
snippet causing the error. These diagnostics are meant for end-users of 
applications built on DataFusion, providing clearer messages instead of generic 
errors. Here&amp;rsquo;s an example:&lt;/p&gt;
+&lt;p&gt;&lt;img alt="diagnostic-example" class="img-responsive" 
src="/blog/images/datafusion-46.0.0/diagnostic-example.png" 
width="80%"/&gt;&lt;/p&gt;
+&lt;p&gt;Currently, diagnostics cover unresolved table/column references, 
missing &lt;code&gt;GROUP BY&lt;/code&gt; columns, ambiguous references, wrong 
number of UNION columns, type mismatches, and a few others. Future releases 
will extend this to more error types. This feature should greatly ease 
debugging of complex SQL by pinpointing errors directly in the query text. We 
thank &lt;a href="https://github.com/eliaperantoni"&gt;@eliaperantoni&lt;/a&gt; 
for his contributions in this proj [...]
+&lt;h3&gt;Unified&amp;nbsp;&lt;code&gt;DataSourceExec&lt;/code&gt;&amp;nbsp;for
 Table Providers&lt;/h3&gt;
+&lt;p&gt;As mentioned, DataFusion now uses a 
unified&amp;nbsp;&lt;code&gt;DataSourceExec&lt;/code&gt;&amp;nbsp;for reading 
tables, which is both a breaking change and a feature.&amp;nbsp;&lt;em&gt;Why 
is this important?&lt;/em&gt;&amp;nbsp;The new approach simplifies how custom 
table providers are integrated and optimized. Namely, the optimizer can treat 
file scans uniformly and push down filters/limits more consistently when there 
is one execution plan that handles all data sources. The [...]
+&lt;p&gt;All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have 
been migrated to this framework. This unification makes the codebase cleaner 
and sets the stage for future enhancements (like consistent metadata handling 
and limit pushdown across all formats). Check out PR &lt;a 
href="https://github.com/apache/datafusion/pull/14224"&gt;#14224&lt;/a&gt; for 
design details. We thank &lt;a 
href="https://github.com/mertak-synnada"&gt;@mertak-synnada&lt;/a&gt; and &lt;a 
href="https:/ [...]
+&lt;h3&gt;FFI Support for Scalar UDFs&lt;/h3&gt;
+&lt;p&gt;DataFusion&amp;rsquo;s Foreign Function Interface (FFI) has been 
extended to support&amp;nbsp;&lt;a 
href="https://github.com/apache/datafusion/pull/14579"&gt;&lt;strong&gt;user-defined
 scalar functions&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;defined in external 
languages. In 46.0.0, you can now expose a custom scalar UDF through the FFI 
layer and use it in DataFusion as if it were built-in. This is particularly 
exciting for the &lt;strong&gt;Python bindings&lt;/strong&gt; and other la [...]
+&lt;h3&gt;New Statistics/Distribution Framework&lt;/h3&gt;
+&lt;p&gt;This release, thanks mainly to &lt;a 
href="https://github.com/Fly-Style"&gt;@Fly-Style&lt;/a&gt; with contributions 
from &lt;a href="https://github.com/ozankabak"&gt;@ozankabak&lt;/a&gt; and 
&lt;a href="https://github.com/berkaysynnada"&gt;@berkaysynnada&lt;/a&gt;, 
includes the initial pieces of a&amp;nbsp;&lt;a 
href="https://github.com/apache/datafusion/pull/14699"&gt;**redesigned 
statistics framework&lt;/a&gt;.&lt;strong&gt; DataFusion&amp;rsquo;s optimizer 
can now represent c [...]
+&lt;p&gt;For example, if a filter expression is applied to a column with a 
known uniform distribution range, the optimizer can propagate that to estimate 
result selectivity more accurately. Similarly, comparisons 
(&lt;code&gt;=&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, etc.) on 
columns yield Bernoulli distributions (with true/false probabilities) in this 
model.&lt;/p&gt;
+&lt;p&gt;This is a foundational change with many follow-on PRs underway. Even 
though the immediate user-visible effect is limited (the optimizer didn't 
magically improve by an order of magnitude overnight), but it lays groundwork 
for more advanced query planning in the future. Over time, as statistics 
information encapsulated in &lt;code&gt;Distribution&lt;/code&gt;s get 
integrated, DataFusion will be able to make smarter decisions like more 
aggressive parquet pruning, better join orderi [...]
+&lt;h3&gt;Aggregate Monotonicity and Window Ordering&lt;/h3&gt;
+&lt;p&gt;DataFusion 46.0.0 adds a new concept of &lt;a 
href="https://github.com/apache/datafusion/pull/14271"&gt;&lt;strong&gt;set-monotonicity&lt;/strong&gt;&lt;/a&gt;
 for certain transformations, which helps avoid unnecessary sort operations. In 
particular, the planner now understands when a &lt;strong&gt;window function 
introduces new orderings of data&lt;/strong&gt;.&lt;/p&gt;
+&lt;p&gt;For example, DataFusion now recognizes that a window-aggregate like 
&lt;code&gt;MAX&lt;/code&gt; on a column can produce a result that is 
&lt;strong&gt;monotonically increasing&lt;/strong&gt;, even if the input column 
is unordered &amp;mdash; depending on the window frame used.&lt;/p&gt;
+&lt;p&gt;Consider the following query:&lt;/p&gt;
+&lt;div 
class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span 
class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span 
class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c1&lt;/span&gt;&lt;span 
class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span 
class="p"&gt;(&lt;/span&gt;
+    &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span 
class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="n"&gt;UNBOUNDED&lt;/span&gt; 
&lt;span class="n"&gt;PRECEDING&lt;/span&gt; &lt;span 
class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;CURRENT&lt;/span&gt; 
&lt;span class="k"&gt;ROW&lt;/span&gt;
+&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; 
&lt;span class="n"&gt;max_c1&lt;/span&gt;
+&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span 
class="n"&gt;c1_table&lt;/span&gt;
+&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; 
&lt;span class="n"&gt;max_c1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;In earlier versions of DataFusion, this query would require an 
additional SortExec on max_c1 to satisfy the ORDER BY clause. However, with the 
new set-monotonicity logic, the planner knows that MAX(...) OVER (...) produces 
values that are not smaller than the previous row, making the extra sort 
redundant. This leads to more efficient query execution.&lt;/p&gt;
+&lt;p&gt;PR &lt;a 
href="https://github.com/apache/datafusion/pull/14271"&gt;#14271&lt;/a&gt; 
introduced the core monotonicity tracking for aggregates and window functions.
+PR &lt;a 
href="https://github.com/apache/datafusion/pull/14813"&gt;#14813&lt;/a&gt; 
improved ordering preservation within various window frame types, and brought 
an extensive test coverage.
+Huge thanks to &lt;a 
href="https://github.com/berkaysynnada"&gt;@berkaysynnada&lt;/a&gt; and &lt;a 
href="https://github.com/mertak-synnada"&gt;@mertak-synnada&lt;/a&gt; for 
designing and implementing this optimizer enhancement!&lt;/p&gt;
+&lt;h3&gt;UNION [ALL | DISTINCT] BY NAME Support&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports UNION BY NAME and UNION ALL BY NAME, which 
align columns by name instead of position. This matches functionality found in 
systems like Spark and DuckDB and simplifies combining heterogeneously ordered 
result sets.&lt;/p&gt;
+&lt;p&gt;You no longer need to rewrite column order manually &amp;mdash; just 
write:&lt;/p&gt;
+&lt;div 
class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span 
class="k"&gt;SELECT&lt;/span&gt; &lt;span 
class="n"&gt;col1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span 
class="n"&gt;col2&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span 
class="n"&gt;t1&lt;/span&gt;
+&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span 
class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span 
class="n"&gt;NAME&lt;/span&gt;
+&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span 
class="n"&gt;col2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span 
class="n"&gt;col1&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span 
class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;Under the hood, this is supported by the new union_by_name() and 
union_by_name_distinct() plan builder methods.&lt;/p&gt;
+&lt;p&gt;Thanks to &lt;a 
href="https://github.com/rkrishn7"&gt;@rkrishn7&lt;/a&gt; for PR &lt;a 
href="https://github.com/apache/datafusion/pull/14538"&gt;#14538&lt;/a&gt;.&lt;/p&gt;
+&lt;h3&gt;New range() Table Function&lt;/h3&gt;
+&lt;p&gt;A new table-valued function range(start, stop, step) has been added 
to make it easy to generate integer sequences &amp;mdash; similar to 
PostgreSQL&amp;rsquo;s generate_series() or Spark&amp;rsquo;s range().&lt;/p&gt;
+&lt;p&gt;Example:&lt;/p&gt;
+&lt;div 
class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span 
class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span 
class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;range&lt;/span&gt;&lt;span 
class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span 
class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span 
class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span 
class="p"&gt;);&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;This returns: 1, 3, 5, 7, 9. It&amp;rsquo;s great for testing, cross 
joins, surrogate keys, and more.&lt;/p&gt;
+&lt;p&gt;Thanks to &lt;a 
href="https://github.com/simonvandel"&gt;@simonvandel&lt;/a&gt; for PR &lt;a 
href="https://github.com/apache/datafusion/pull/14830"&gt;#14830&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;Upgrade Guide and Changelog&lt;/h2&gt;
+&lt;p&gt;Upgrading to 46.0.0 should be straightforward for most users, but do 
review the&amp;nbsp;&lt;a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade
 Guide for DataFusion 46.0.0&lt;/a&gt;&amp;nbsp;for detailed steps and code 
changes. The upgrade guide covers the breaking changes mentioned (like 
replacing old exec nodes with&amp;nbsp;&lt;code&gt;DataSourceExec&lt;/code&gt;, 
updating UDF invocation to&amp;nbsp;&lt;code&gt;invoke_with_args&lt;/code&gt;, 
e [...]
+&lt;h2&gt;Get Involved&lt;/h2&gt;
+&lt;p&gt;Apache DataFusion is an open-source project, and we welcome 
involvement from anyone interested. Now is a great time to take 46.0.0 for a 
spin: try it out on your workloads, and let us know if you encounter any issues 
or have suggestions. You can report bugs or request features on 
our&amp;nbsp;GitHub issue tracker, or better yet, submit a pull request. Join 
our community discussions &amp;ndash; whether you have questions, want to share 
how you&amp;rsquo;re using DataFusion, or ar [...]
+&lt;p&gt;Happy querying!&lt;/p&gt;</content><category 
term="blog"></category></entry><entry><title>Efficient Filter Pushdown in 
Parquet</title><link 
href="https://datafusion.apache.org/blog/2025/03/21/parquet-pushdown"; 
rel="alternate"></link><published>2025-03-21T00:00:00+00:00</published><updated>2025-03-21T00:00:00+00:00</updated><author><name>Xiangpeng
 
Hao</name></author><id>tag:datafusion.apache.org,2025-03-21:/blog/2025/03/21/parquet-pushdown</id><summary
 type="html">&lt;style&gt;
 figure {
   margin: 20px 0;
 }
diff --git a/output/feeds/blog.atom.xml b/output/feeds/blog.atom.xml
index dbe61aa..30462c6 100644
--- a/output/feeds/blog.atom.xml
+++ b/output/feeds/blog.atom.xml
@@ -1,5 +1,107 @@
 <?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - 
blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-03-21T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Efficient
 Filter Pushdown in Parquet</title><link 
href="https://datafusion.apache.org/blog/2025/03/21/parquet-pushdown"; rel=" 
[...]
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - 
blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-03-24T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion 46.0.0 Released</title><link 
href="https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0"; rel="al 
[...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;p&gt;We&amp;rsquo;re excited to announce the release 
of&amp;nbsp;&lt;strong&gt;Apache DataFusion 46.0.0&lt;/strong&gt;! This new 
version represents a significant milestone for the project, packing in a wide 
range of improvements and fixes. You can find the complete details in 
the&amp;nbsp;full &lt;a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md"&gt;changelog&lt;/a&gt;.
 We&amp;rsquo;ll highlight the most important changes below 
…&lt;/p&gt;</summary> [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;p&gt;We&amp;rsquo;re excited to announce the release 
of&amp;nbsp;&lt;strong&gt;Apache DataFusion 46.0.0&lt;/strong&gt;! This new 
version represents a significant milestone for the project, packing in a wide 
range of improvements and fixes. You can find the complete details in 
the&amp;nbsp;full &lt;a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md"&gt;changelog&lt;/a&gt;.
 We&amp;rsquo;ll highlight the most important changes below and guide you 
through [...]
+&lt;h2&gt;Breaking Changes&lt;/h2&gt;
+&lt;p&gt;DataFusion 46.0.0 brings a few&amp;nbsp;&lt;strong&gt;breaking 
changes&lt;/strong&gt;&amp;nbsp;that may require adjustments to your code as 
described in the &lt;a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade
 Guide&lt;/a&gt;. Here are the most notable ones:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;&lt;a 
href="https://github.com/apache/datafusion/pull/14224#"&gt;Unified 
&lt;code&gt;DataSourceExec&lt;/code&gt; Execution 
Plan&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt;&amp;nbsp;DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes 
(&lt;code&gt;ParquetExec&lt;/code&gt;,&amp;nbsp;&lt;code&gt;CsvExec&lt;/code&gt;,&amp;nbsp;&lt;code&gt;JsonExec&lt;/code&gt;,&amp;nbsp;&lt;code&gt;AvroExec&lt;/code&gt;,
 etc.) have been [...]
+&lt;li&gt;&lt;a 
href="https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2"&gt;**Error
 Handling Improvements&lt;/a&gt; 
(&lt;code&gt;DataFusionError::Collection&lt;/code&gt;):**&amp;nbsp;We began 
overhauling DataFusion&amp;rsquo;s approach to error handling. In this release, 
a new error 
variant&amp;nbsp;&lt;code&gt;DataFusionError::Collection&lt;/code&gt;&amp;nbsp;(and
 related mechanisms) has been introduced to aggregate multiple errors into one. 
This is part of a broader effo [...]
+&lt;/ul&gt;
+&lt;h2&gt;Performance Improvements&lt;/h2&gt;
+&lt;p&gt;DataFusion 46.0.0 comes with a slew of performance enhancements 
across the board. Here are some of the noteworthy optimizations in this 
release:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;&lt;strong&gt;Faster&amp;nbsp;&lt;code&gt;median()&lt;/code&gt;&amp;nbsp;(no
 
grouping):&lt;/strong&gt;&amp;nbsp;The&amp;nbsp;&lt;code&gt;median()&lt;/code&gt;&amp;nbsp;aggregate
 function got a special fast path when used without 
a&amp;nbsp;&lt;code&gt;GROUP BY&lt;/code&gt;. By optimizing its accumulator, 
median calculation is about&amp;nbsp;&lt;strong&gt;2&amp;times; 
faster&lt;/strong&gt;&amp;nbsp;in the single-group case. If you 
use&amp;nbsp;&lt;code&gt;MEDIAN()&lt;/code&gt;&a [...]
+&lt;li&gt;&lt;strong&gt;Optimized&amp;nbsp;&lt;code&gt;FIRST_VALUE&lt;/code&gt;/&lt;code&gt;LAST_VALUE&lt;/code&gt;:&lt;/strong&gt;&amp;nbsp;The&amp;nbsp;&lt;code&gt;FIRST_VALUE&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;LAST_VALUE&lt;/code&gt;&amp;nbsp;window
 functions have been improved by avoiding an internal sort of rows. Instead of 
sorting each partition, the implementation now uses a direct approach to pick 
the first/last element. This yields&amp;nbsp;&lt;strong&gt;10&amp;ndash [...]
+&lt;li&gt;&lt;strong&gt;&lt;code&gt;repeat()&lt;/code&gt;&amp;nbsp;String 
Function Boost:&lt;/strong&gt;&amp;nbsp;Repeating strings is now more efficient 
&amp;ndash; the&amp;nbsp;&lt;code&gt;repeat(text, 
n)&lt;/code&gt;&amp;nbsp;function was optimized by 
about&amp;nbsp;&lt;strong&gt;50%&lt;/strong&gt;. This was achieved by reducing 
allocations and using a more efficient concatenation strategy. If you generate 
large repeated strings in queries, this can cut the time nearly in half (PR &lt 
[...]
+&lt;li&gt;&lt;strong&gt;Ultra-fast&amp;nbsp;&lt;code&gt;uuid()&lt;/code&gt;&amp;nbsp;UDF:&lt;/strong&gt;&amp;nbsp;The&amp;nbsp;&lt;code&gt;uuid()&lt;/code&gt;&amp;nbsp;function
 (which generates random UUID strings) received a major speed-up. 
It&amp;rsquo;s now roughly&amp;nbsp;&lt;strong&gt;40&amp;times; 
faster&lt;/strong&gt;&amp;nbsp;than before! The new implementation avoids 
unnecessary string copying and uses a more direct conversion to hex, making 
bulk UUID generation far more practi [...]
+&lt;li&gt;&lt;strong&gt;Accelerated&amp;nbsp;&lt;code&gt;chr()&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;to_hex()&lt;/code&gt;:&lt;/strong&gt;&amp;nbsp;Several
 scalar functions have been micro-optimized. 
The&amp;nbsp;&lt;code&gt;chr()&lt;/code&gt;&amp;nbsp;function (which returns 
the character for a given ASCII code) is 
about&amp;nbsp;&lt;strong&gt;4&amp;times; faster&lt;/strong&gt;&amp;nbsp;now, 
and the&amp;nbsp;&lt;code&gt;to_hex()&lt;/code&gt;&amp;nbsp;function (which 
converts nu [...]
+&lt;li&gt;&lt;strong&gt;No More RowConverter in Grouped 
Ordering:&lt;/strong&gt;&amp;nbsp;We removed an inefficient step in 
the&amp;nbsp;&lt;em&gt;partial grouping&lt;/em&gt;&amp;nbsp;algorithm. 
The&amp;nbsp;&lt;code&gt;GroupOrderingPartial&lt;/code&gt;&amp;nbsp;operator no 
longer converts data to &amp;ldquo;row format&amp;rdquo; for each batch 
(via&amp;nbsp;&lt;code&gt;RowConverter&lt;/code&gt;). Instead, it uses a direct 
arrow-based approach to detect sort key changes. This eliminated  [...]
+&lt;li&gt;&lt;strong&gt;Predicate Pruning for&amp;nbsp;&lt;code&gt;NOT 
LIKE&lt;/code&gt;:&lt;/strong&gt;&amp;nbsp;DataFusion&amp;rsquo;s parquet 
reader can now prune row groups using&amp;nbsp;&lt;code&gt;NOT 
LIKE&lt;/code&gt;&amp;nbsp;filters, similar to how it 
handles&amp;nbsp;&lt;code&gt;LIKE&lt;/code&gt;. This means if you have a filter 
such as&amp;nbsp;&lt;code&gt;column NOT LIKE 'prefix%'&lt;/code&gt;, DataFusion 
can use min/max statistics to skip reading files/parts that can be det [...]
+&lt;/ul&gt;
+&lt;h2&gt;Google Summer of Code 2025&lt;/h2&gt;
+&lt;p&gt;Another exciting development:&amp;nbsp;&lt;strong&gt;Apache 
DataFusion has been accepted as a mentoring organization for Google Summer of 
Code (GSoC) 2025&lt;/strong&gt;! 🎉 This means that this summer, students from 
around the world will have the opportunity to contribute to DataFusion under 
the guidance of our committers. We have put together &lt;a 
href="https://datafusion.apache.org/contributor-guide/gsoc_project_ideas.html"&gt;a
 list of project ideas&lt;/a&gt; that candidates [...]
+&lt;p&gt;If you&amp;rsquo;re interested, check out our&amp;nbsp;&lt;a 
href="https://datafusion.apache.org/contributor-guide/gsoc_application_guidelines.html"&gt;GSoC
 Application Guidelines&lt;/a&gt;. We encourage students to reach out, discuss 
ideas with us, and apply.&lt;/p&gt;
+&lt;h2&gt;Highlighted New Features&lt;/h2&gt;
+&lt;h3&gt;Improved Diagnostics&lt;/h3&gt;
+&lt;p&gt;DataFusion 46.0.0 introduces a new&amp;nbsp;&lt;a 
href="https://github.com/apache/datafusion/issues/14429"&gt;&lt;strong&gt;SQL 
Diagnostics framework&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;to make error messages 
more understandable. This comes in the form of 
new&amp;nbsp;&lt;code&gt;Diagnostic&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;DiagnosticEntry&lt;/code&gt;&amp;nbsp;types,
 which allow the system to attach rich context (like source query text spans) 
to error messages. In pr [...]
+&lt;p&gt;For example, if you reference an unknown table or miss a column in 
&lt;code&gt;GROUP BY&lt;/code&gt; the error message will include the query 
snippet causing the error. These diagnostics are meant for end-users of 
applications built on DataFusion, providing clearer messages instead of generic 
errors. Here&amp;rsquo;s an example:&lt;/p&gt;
+&lt;p&gt;&lt;img alt="diagnostic-example" class="img-responsive" 
src="/blog/images/datafusion-46.0.0/diagnostic-example.png" 
width="80%"/&gt;&lt;/p&gt;
+&lt;p&gt;Currently, diagnostics cover unresolved table/column references, 
missing &lt;code&gt;GROUP BY&lt;/code&gt; columns, ambiguous references, wrong 
number of UNION columns, type mismatches, and a few others. Future releases 
will extend this to more error types. This feature should greatly ease 
debugging of complex SQL by pinpointing errors directly in the query text. We 
thank &lt;a href="https://github.com/eliaperantoni"&gt;@eliaperantoni&lt;/a&gt; 
for his contributions in this proj [...]
+&lt;h3&gt;Unified&amp;nbsp;&lt;code&gt;DataSourceExec&lt;/code&gt;&amp;nbsp;for
 Table Providers&lt;/h3&gt;
+&lt;p&gt;As mentioned, DataFusion now uses a 
unified&amp;nbsp;&lt;code&gt;DataSourceExec&lt;/code&gt;&amp;nbsp;for reading 
tables, which is both a breaking change and a feature.&amp;nbsp;&lt;em&gt;Why 
is this important?&lt;/em&gt;&amp;nbsp;The new approach simplifies how custom 
table providers are integrated and optimized. Namely, the optimizer can treat 
file scans uniformly and push down filters/limits more consistently when there 
is one execution plan that handles all data sources. The [...]
+&lt;p&gt;All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have 
been migrated to this framework. This unification makes the codebase cleaner 
and sets the stage for future enhancements (like consistent metadata handling 
and limit pushdown across all formats). Check out PR &lt;a 
href="https://github.com/apache/datafusion/pull/14224"&gt;#14224&lt;/a&gt; for 
design details. We thank &lt;a 
href="https://github.com/mertak-synnada"&gt;@mertak-synnada&lt;/a&gt; and &lt;a 
href="https:/ [...]
+&lt;h3&gt;FFI Support for Scalar UDFs&lt;/h3&gt;
+&lt;p&gt;DataFusion&amp;rsquo;s Foreign Function Interface (FFI) has been 
extended to support&amp;nbsp;&lt;a 
href="https://github.com/apache/datafusion/pull/14579"&gt;&lt;strong&gt;user-defined
 scalar functions&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;defined in external 
languages. In 46.0.0, you can now expose a custom scalar UDF through the FFI 
layer and use it in DataFusion as if it were built-in. This is particularly 
exciting for the &lt;strong&gt;Python bindings&lt;/strong&gt; and other la [...]
+&lt;h3&gt;New Statistics/Distribution Framework&lt;/h3&gt;
+&lt;p&gt;This release, thanks mainly to &lt;a 
href="https://github.com/Fly-Style"&gt;@Fly-Style&lt;/a&gt; with contributions 
from &lt;a href="https://github.com/ozankabak"&gt;@ozankabak&lt;/a&gt; and 
&lt;a href="https://github.com/berkaysynnada"&gt;@berkaysynnada&lt;/a&gt;, 
includes the initial pieces of a&amp;nbsp;&lt;a 
href="https://github.com/apache/datafusion/pull/14699"&gt;**redesigned 
statistics framework&lt;/a&gt;.&lt;strong&gt; DataFusion&amp;rsquo;s optimizer 
can now represent c [...]
+&lt;p&gt;For example, if a filter expression is applied to a column with a 
known uniform distribution range, the optimizer can propagate that to estimate 
result selectivity more accurately. Similarly, comparisons 
(&lt;code&gt;=&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, etc.) on 
columns yield Bernoulli distributions (with true/false probabilities) in this 
model.&lt;/p&gt;
+&lt;p&gt;This is a foundational change with many follow-on PRs underway. Even 
though the immediate user-visible effect is limited (the optimizer didn't 
magically improve by an order of magnitude overnight), but it lays groundwork 
for more advanced query planning in the future. Over time, as statistics 
information encapsulated in &lt;code&gt;Distribution&lt;/code&gt;s get 
integrated, DataFusion will be able to make smarter decisions like more 
aggressive parquet pruning, better join orderi [...]
+&lt;h3&gt;Aggregate Monotonicity and Window Ordering&lt;/h3&gt;
+&lt;p&gt;DataFusion 46.0.0 adds a new concept of &lt;a 
href="https://github.com/apache/datafusion/pull/14271"&gt;&lt;strong&gt;set-monotonicity&lt;/strong&gt;&lt;/a&gt;
 for certain transformations, which helps avoid unnecessary sort operations. In 
particular, the planner now understands when a &lt;strong&gt;window function 
introduces new orderings of data&lt;/strong&gt;.&lt;/p&gt;
+&lt;p&gt;For example, DataFusion now recognizes that a window-aggregate like 
&lt;code&gt;MAX&lt;/code&gt; on a column can produce a result that is 
&lt;strong&gt;monotonically increasing&lt;/strong&gt;, even if the input column 
is unordered &amp;mdash; depending on the window frame used.&lt;/p&gt;
+&lt;p&gt;Consider the following query:&lt;/p&gt;
+&lt;div 
class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span 
class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span 
class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c1&lt;/span&gt;&lt;span 
class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span 
class="p"&gt;(&lt;/span&gt;
+    &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span 
class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="n"&gt;UNBOUNDED&lt;/span&gt; 
&lt;span class="n"&gt;PRECEDING&lt;/span&gt; &lt;span 
class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;CURRENT&lt;/span&gt; 
&lt;span class="k"&gt;ROW&lt;/span&gt;
+&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; 
&lt;span class="n"&gt;max_c1&lt;/span&gt;
+&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span 
class="n"&gt;c1_table&lt;/span&gt;
+&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; 
&lt;span class="n"&gt;max_c1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;In earlier versions of DataFusion, this query would require an 
additional SortExec on max_c1 to satisfy the ORDER BY clause. However, with the 
new set-monotonicity logic, the planner knows that MAX(...) OVER (...) produces 
values that are not smaller than the previous row, making the extra sort 
redundant. This leads to more efficient query execution.&lt;/p&gt;
+&lt;p&gt;PR &lt;a 
href="https://github.com/apache/datafusion/pull/14271"&gt;#14271&lt;/a&gt; 
introduced the core monotonicity tracking for aggregates and window functions.
+PR &lt;a 
href="https://github.com/apache/datafusion/pull/14813"&gt;#14813&lt;/a&gt; 
improved ordering preservation within various window frame types, and brought 
an extensive test coverage.
+Huge thanks to &lt;a 
href="https://github.com/berkaysynnada"&gt;@berkaysynnada&lt;/a&gt; and &lt;a 
href="https://github.com/mertak-synnada"&gt;@mertak-synnada&lt;/a&gt; for 
designing and implementing this optimizer enhancement!&lt;/p&gt;
+&lt;h3&gt;UNION [ALL | DISTINCT] BY NAME Support&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports UNION BY NAME and UNION ALL BY NAME, which 
align columns by name instead of position. This matches functionality found in 
systems like Spark and DuckDB and simplifies combining heterogeneously ordered 
result sets.&lt;/p&gt;
+&lt;p&gt;You no longer need to rewrite column order manually &amp;mdash; just 
write:&lt;/p&gt;
+&lt;div 
class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span 
class="k"&gt;SELECT&lt;/span&gt; &lt;span 
class="n"&gt;col1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span 
class="n"&gt;col2&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span 
class="n"&gt;t1&lt;/span&gt;
+&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span 
class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span 
class="n"&gt;NAME&lt;/span&gt;
+&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span 
class="n"&gt;col2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span 
class="n"&gt;col1&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span 
class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;Under the hood, this is supported by the new union_by_name() and 
union_by_name_distinct() plan builder methods.&lt;/p&gt;
+&lt;p&gt;Thanks to &lt;a 
href="https://github.com/rkrishn7"&gt;@rkrishn7&lt;/a&gt; for PR &lt;a 
href="https://github.com/apache/datafusion/pull/14538"&gt;#14538&lt;/a&gt;.&lt;/p&gt;
+&lt;h3&gt;New range() Table Function&lt;/h3&gt;
+&lt;p&gt;A new table-valued function range(start, stop, step) has been added 
to make it easy to generate integer sequences &amp;mdash; similar to 
PostgreSQL&amp;rsquo;s generate_series() or Spark&amp;rsquo;s range().&lt;/p&gt;
+&lt;p&gt;Example:&lt;/p&gt;
+&lt;div 
class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span 
class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span 
class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;range&lt;/span&gt;&lt;span 
class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span 
class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span 
class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span 
class="p"&gt;);&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;This returns: 1, 3, 5, 7, 9. It&amp;rsquo;s great for testing, cross 
joins, surrogate keys, and more.&lt;/p&gt;
+&lt;p&gt;Thanks to &lt;a 
href="https://github.com/simonvandel"&gt;@simonvandel&lt;/a&gt; for PR &lt;a 
href="https://github.com/apache/datafusion/pull/14830"&gt;#14830&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;Upgrade Guide and Changelog&lt;/h2&gt;
+&lt;p&gt;Upgrading to 46.0.0 should be straightforward for most users, but do 
review the&amp;nbsp;&lt;a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade
 Guide for DataFusion 46.0.0&lt;/a&gt;&amp;nbsp;for detailed steps and code 
changes. The upgrade guide covers the breaking changes mentioned (like 
replacing old exec nodes with&amp;nbsp;&lt;code&gt;DataSourceExec&lt;/code&gt;, 
updating UDF invocation to&amp;nbsp;&lt;code&gt;invoke_with_args&lt;/code&gt;, 
e [...]
+&lt;h2&gt;Get Involved&lt;/h2&gt;
+&lt;p&gt;Apache DataFusion is an open-source project, and we welcome 
involvement from anyone interested. Now is a great time to take 46.0.0 for a 
spin: try it out on your workloads, and let us know if you encounter any issues 
or have suggestions. You can report bugs or request features on 
our&amp;nbsp;GitHub issue tracker, or better yet, submit a pull request. Join 
our community discussions &amp;ndash; whether you have questions, want to share 
how you&amp;rsquo;re using DataFusion, or ar [...]
+&lt;p&gt;Happy querying!&lt;/p&gt;</content><category 
term="blog"></category></entry><entry><title>Efficient Filter Pushdown in 
Parquet</title><link 
href="https://datafusion.apache.org/blog/2025/03/21/parquet-pushdown"; 
rel="alternate"></link><published>2025-03-21T00:00:00+00:00</published><updated>2025-03-21T00:00:00+00:00</updated><author><name>Xiangpeng
 
Hao</name></author><id>tag:datafusion.apache.org,2025-03-21:/blog/2025/03/21/parquet-pushdown</id><summary
 type="html">&lt;style&gt;
 figure {
   margin: 20px 0;
 }
diff --git 
a/output/feeds/oznur-hanci-and-berkay-sahin-on-behalf-of-the-pmc.atom.xml 
b/output/feeds/oznur-hanci-and-berkay-sahin-on-behalf-of-the-pmc.atom.xml
new file mode 100644
index 0000000..516826e
--- /dev/null
+++ b/output/feeds/oznur-hanci-and-berkay-sahin-on-behalf-of-the-pmc.atom.xml
@@ -0,0 +1,104 @@
+<?xml version="1.0" encoding="utf-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - 
Oznur Hanci and Berkay Sahin on behalf of the PMC</title><link 
href="https://datafusion.apache.org/blog/"; rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/oznur-hanci-and-berkay-sahin-on-behalf-of-the-pmc.atom.xml";
 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-03-24T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion 46.0.0 Released</ [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;p&gt;We&amp;rsquo;re excited to announce the release 
of&amp;nbsp;&lt;strong&gt;Apache DataFusion 46.0.0&lt;/strong&gt;! This new 
version represents a significant milestone for the project, packing in a wide 
range of improvements and fixes. You can find the complete details in 
the&amp;nbsp;full &lt;a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md"&gt;changelog&lt;/a&gt;.
 We&amp;rsquo;ll highlight the most important changes below 
…&lt;/p&gt;</summary> [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;p&gt;We&amp;rsquo;re excited to announce the release 
of&amp;nbsp;&lt;strong&gt;Apache DataFusion 46.0.0&lt;/strong&gt;! This new 
version represents a significant milestone for the project, packing in a wide 
range of improvements and fixes. You can find the complete details in 
the&amp;nbsp;full &lt;a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md"&gt;changelog&lt;/a&gt;.
 We&amp;rsquo;ll highlight the most important changes below and guide you 
through [...]
+&lt;h2&gt;Breaking Changes&lt;/h2&gt;
+&lt;p&gt;DataFusion 46.0.0 brings a few&amp;nbsp;&lt;strong&gt;breaking 
changes&lt;/strong&gt;&amp;nbsp;that may require adjustments to your code as 
described in the &lt;a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade
 Guide&lt;/a&gt;. Here are the most notable ones:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;&lt;a 
href="https://github.com/apache/datafusion/pull/14224#"&gt;Unified 
&lt;code&gt;DataSourceExec&lt;/code&gt; Execution 
Plan&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt;&amp;nbsp;DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes 
(&lt;code&gt;ParquetExec&lt;/code&gt;,&amp;nbsp;&lt;code&gt;CsvExec&lt;/code&gt;,&amp;nbsp;&lt;code&gt;JsonExec&lt;/code&gt;,&amp;nbsp;&lt;code&gt;AvroExec&lt;/code&gt;,
 etc.) have been [...]
+&lt;li&gt;&lt;a 
href="https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2"&gt;**Error
 Handling Improvements&lt;/a&gt; 
(&lt;code&gt;DataFusionError::Collection&lt;/code&gt;):**&amp;nbsp;We began 
overhauling DataFusion&amp;rsquo;s approach to error handling. In this release, 
a new error 
variant&amp;nbsp;&lt;code&gt;DataFusionError::Collection&lt;/code&gt;&amp;nbsp;(and
 related mechanisms) has been introduced to aggregate multiple errors into one. 
This is part of a broader effo [...]
+&lt;/ul&gt;
+&lt;h2&gt;Performance Improvements&lt;/h2&gt;
+&lt;p&gt;DataFusion 46.0.0 comes with a slew of performance enhancements 
across the board. Here are some of the noteworthy optimizations in this 
release:&lt;/p&gt;
+&lt;ul&gt;
+&lt;li&gt;&lt;strong&gt;Faster&amp;nbsp;&lt;code&gt;median()&lt;/code&gt;&amp;nbsp;(no
 
grouping):&lt;/strong&gt;&amp;nbsp;The&amp;nbsp;&lt;code&gt;median()&lt;/code&gt;&amp;nbsp;aggregate
 function got a special fast path when used without 
a&amp;nbsp;&lt;code&gt;GROUP BY&lt;/code&gt;. By optimizing its accumulator, 
median calculation is about&amp;nbsp;&lt;strong&gt;2&amp;times; 
faster&lt;/strong&gt;&amp;nbsp;in the single-group case. If you 
use&amp;nbsp;&lt;code&gt;MEDIAN()&lt;/code&gt;&a [...]
+&lt;li&gt;&lt;strong&gt;Optimized&amp;nbsp;&lt;code&gt;FIRST_VALUE&lt;/code&gt;/&lt;code&gt;LAST_VALUE&lt;/code&gt;:&lt;/strong&gt;&amp;nbsp;The&amp;nbsp;&lt;code&gt;FIRST_VALUE&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;LAST_VALUE&lt;/code&gt;&amp;nbsp;window
 functions have been improved by avoiding an internal sort of rows. Instead of 
sorting each partition, the implementation now uses a direct approach to pick 
the first/last element. This yields&amp;nbsp;&lt;strong&gt;10&amp;ndash [...]
+&lt;li&gt;&lt;strong&gt;&lt;code&gt;repeat()&lt;/code&gt;&amp;nbsp;String 
Function Boost:&lt;/strong&gt;&amp;nbsp;Repeating strings is now more efficient 
&amp;ndash; the&amp;nbsp;&lt;code&gt;repeat(text, 
n)&lt;/code&gt;&amp;nbsp;function was optimized by 
about&amp;nbsp;&lt;strong&gt;50%&lt;/strong&gt;. This was achieved by reducing 
allocations and using a more efficient concatenation strategy. If you generate 
large repeated strings in queries, this can cut the time nearly in half (PR &lt 
[...]
+&lt;li&gt;&lt;strong&gt;Ultra-fast&amp;nbsp;&lt;code&gt;uuid()&lt;/code&gt;&amp;nbsp;UDF:&lt;/strong&gt;&amp;nbsp;The&amp;nbsp;&lt;code&gt;uuid()&lt;/code&gt;&amp;nbsp;function
 (which generates random UUID strings) received a major speed-up. 
It&amp;rsquo;s now roughly&amp;nbsp;&lt;strong&gt;40&amp;times; 
faster&lt;/strong&gt;&amp;nbsp;than before! The new implementation avoids 
unnecessary string copying and uses a more direct conversion to hex, making 
bulk UUID generation far more practi [...]
+&lt;li&gt;&lt;strong&gt;Accelerated&amp;nbsp;&lt;code&gt;chr()&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;to_hex()&lt;/code&gt;:&lt;/strong&gt;&amp;nbsp;Several
 scalar functions have been micro-optimized. 
The&amp;nbsp;&lt;code&gt;chr()&lt;/code&gt;&amp;nbsp;function (which returns 
the character for a given ASCII code) is 
about&amp;nbsp;&lt;strong&gt;4&amp;times; faster&lt;/strong&gt;&amp;nbsp;now, 
and the&amp;nbsp;&lt;code&gt;to_hex()&lt;/code&gt;&amp;nbsp;function (which 
converts nu [...]
+&lt;li&gt;&lt;strong&gt;No More RowConverter in Grouped 
Ordering:&lt;/strong&gt;&amp;nbsp;We removed an inefficient step in 
the&amp;nbsp;&lt;em&gt;partial grouping&lt;/em&gt;&amp;nbsp;algorithm. 
The&amp;nbsp;&lt;code&gt;GroupOrderingPartial&lt;/code&gt;&amp;nbsp;operator no 
longer converts data to &amp;ldquo;row format&amp;rdquo; for each batch 
(via&amp;nbsp;&lt;code&gt;RowConverter&lt;/code&gt;). Instead, it uses a direct 
arrow-based approach to detect sort key changes. This eliminated  [...]
+&lt;li&gt;&lt;strong&gt;Predicate Pruning for&amp;nbsp;&lt;code&gt;NOT 
LIKE&lt;/code&gt;:&lt;/strong&gt;&amp;nbsp;DataFusion&amp;rsquo;s parquet 
reader can now prune row groups using&amp;nbsp;&lt;code&gt;NOT 
LIKE&lt;/code&gt;&amp;nbsp;filters, similar to how it 
handles&amp;nbsp;&lt;code&gt;LIKE&lt;/code&gt;. This means if you have a filter 
such as&amp;nbsp;&lt;code&gt;column NOT LIKE 'prefix%'&lt;/code&gt;, DataFusion 
can use min/max statistics to skip reading files/parts that can be det [...]
+&lt;/ul&gt;
+&lt;h2&gt;Google Summer of Code 2025&lt;/h2&gt;
+&lt;p&gt;Another exciting development:&amp;nbsp;&lt;strong&gt;Apache 
DataFusion has been accepted as a mentoring organization for Google Summer of 
Code (GSoC) 2025&lt;/strong&gt;! 🎉 This means that this summer, students from 
around the world will have the opportunity to contribute to DataFusion under 
the guidance of our committers. We have put together &lt;a 
href="https://datafusion.apache.org/contributor-guide/gsoc_project_ideas.html"&gt;a
 list of project ideas&lt;/a&gt; that candidates [...]
+&lt;p&gt;If you&amp;rsquo;re interested, check out our&amp;nbsp;&lt;a 
href="https://datafusion.apache.org/contributor-guide/gsoc_application_guidelines.html"&gt;GSoC
 Application Guidelines&lt;/a&gt;. We encourage students to reach out, discuss 
ideas with us, and apply.&lt;/p&gt;
+&lt;h2&gt;Highlighted New Features&lt;/h2&gt;
+&lt;h3&gt;Improved Diagnostics&lt;/h3&gt;
+&lt;p&gt;DataFusion 46.0.0 introduces a new&amp;nbsp;&lt;a 
href="https://github.com/apache/datafusion/issues/14429"&gt;&lt;strong&gt;SQL 
Diagnostics framework&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;to make error messages 
more understandable. This comes in the form of 
new&amp;nbsp;&lt;code&gt;Diagnostic&lt;/code&gt;&amp;nbsp;and&amp;nbsp;&lt;code&gt;DiagnosticEntry&lt;/code&gt;&amp;nbsp;types,
 which allow the system to attach rich context (like source query text spans) 
to error messages. In pr [...]
+&lt;p&gt;For example, if you reference an unknown table or miss a column in 
&lt;code&gt;GROUP BY&lt;/code&gt; the error message will include the query 
snippet causing the error. These diagnostics are meant for end-users of 
applications built on DataFusion, providing clearer messages instead of generic 
errors. Here&amp;rsquo;s an example:&lt;/p&gt;
+&lt;p&gt;&lt;img alt="diagnostic-example" class="img-responsive" 
src="/blog/images/datafusion-46.0.0/diagnostic-example.png" 
width="80%"/&gt;&lt;/p&gt;
+&lt;p&gt;Currently, diagnostics cover unresolved table/column references, 
missing &lt;code&gt;GROUP BY&lt;/code&gt; columns, ambiguous references, wrong 
number of UNION columns, type mismatches, and a few others. Future releases 
will extend this to more error types. This feature should greatly ease 
debugging of complex SQL by pinpointing errors directly in the query text. We 
thank &lt;a href="https://github.com/eliaperantoni"&gt;@eliaperantoni&lt;/a&gt; 
for his contributions in this proj [...]
+&lt;h3&gt;Unified&amp;nbsp;&lt;code&gt;DataSourceExec&lt;/code&gt;&amp;nbsp;for
 Table Providers&lt;/h3&gt;
+&lt;p&gt;As mentioned, DataFusion now uses a 
unified&amp;nbsp;&lt;code&gt;DataSourceExec&lt;/code&gt;&amp;nbsp;for reading 
tables, which is both a breaking change and a feature.&amp;nbsp;&lt;em&gt;Why 
is this important?&lt;/em&gt;&amp;nbsp;The new approach simplifies how custom 
table providers are integrated and optimized. Namely, the optimizer can treat 
file scans uniformly and push down filters/limits more consistently when there 
is one execution plan that handles all data sources. The [...]
+&lt;p&gt;All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have 
been migrated to this framework. This unification makes the codebase cleaner 
and sets the stage for future enhancements (like consistent metadata handling 
and limit pushdown across all formats). Check out PR &lt;a 
href="https://github.com/apache/datafusion/pull/14224"&gt;#14224&lt;/a&gt; for 
design details. We thank &lt;a 
href="https://github.com/mertak-synnada"&gt;@mertak-synnada&lt;/a&gt; and &lt;a 
href="https:/ [...]
+&lt;h3&gt;FFI Support for Scalar UDFs&lt;/h3&gt;
+&lt;p&gt;DataFusion&amp;rsquo;s Foreign Function Interface (FFI) has been 
extended to support&amp;nbsp;&lt;a 
href="https://github.com/apache/datafusion/pull/14579"&gt;&lt;strong&gt;user-defined
 scalar functions&lt;/strong&gt;&lt;/a&gt;&amp;nbsp;defined in external 
languages. In 46.0.0, you can now expose a custom scalar UDF through the FFI 
layer and use it in DataFusion as if it were built-in. This is particularly 
exciting for the &lt;strong&gt;Python bindings&lt;/strong&gt; and other la [...]
+&lt;h3&gt;New Statistics/Distribution Framework&lt;/h3&gt;
+&lt;p&gt;This release, thanks mainly to &lt;a 
href="https://github.com/Fly-Style"&gt;@Fly-Style&lt;/a&gt; with contributions 
from &lt;a href="https://github.com/ozankabak"&gt;@ozankabak&lt;/a&gt; and 
&lt;a href="https://github.com/berkaysynnada"&gt;@berkaysynnada&lt;/a&gt;, 
includes the initial pieces of a&amp;nbsp;&lt;a 
href="https://github.com/apache/datafusion/pull/14699"&gt;**redesigned 
statistics framework&lt;/a&gt;.&lt;strong&gt; DataFusion&amp;rsquo;s optimizer 
can now represent c [...]
+&lt;p&gt;For example, if a filter expression is applied to a column with a 
known uniform distribution range, the optimizer can propagate that to estimate 
result selectivity more accurately. Similarly, comparisons 
(&lt;code&gt;=&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, etc.) on 
columns yield Bernoulli distributions (with true/false probabilities) in this 
model.&lt;/p&gt;
+&lt;p&gt;This is a foundational change with many follow-on PRs underway. Even 
though the immediate user-visible effect is limited (the optimizer didn't 
magically improve by an order of magnitude overnight), but it lays groundwork 
for more advanced query planning in the future. Over time, as statistics 
information encapsulated in &lt;code&gt;Distribution&lt;/code&gt;s get 
integrated, DataFusion will be able to make smarter decisions like more 
aggressive parquet pruning, better join orderi [...]
+&lt;h3&gt;Aggregate Monotonicity and Window Ordering&lt;/h3&gt;
+&lt;p&gt;DataFusion 46.0.0 adds a new concept of &lt;a 
href="https://github.com/apache/datafusion/pull/14271"&gt;&lt;strong&gt;set-monotonicity&lt;/strong&gt;&lt;/a&gt;
 for certain transformations, which helps avoid unnecessary sort operations. In 
particular, the planner now understands when a &lt;strong&gt;window function 
introduces new orderings of data&lt;/strong&gt;.&lt;/p&gt;
+&lt;p&gt;For example, DataFusion now recognizes that a window-aggregate like 
&lt;code&gt;MAX&lt;/code&gt; on a column can produce a result that is 
&lt;strong&gt;monotonically increasing&lt;/strong&gt;, even if the input column 
is unordered &amp;mdash; depending on the window frame used.&lt;/p&gt;
+&lt;p&gt;Consider the following query:&lt;/p&gt;
+&lt;div 
class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span 
class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MAX&lt;/span&gt;&lt;span 
class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c1&lt;/span&gt;&lt;span 
class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span 
class="p"&gt;(&lt;/span&gt;
+    &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span 
class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="n"&gt;UNBOUNDED&lt;/span&gt; 
&lt;span class="n"&gt;PRECEDING&lt;/span&gt; &lt;span 
class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;CURRENT&lt;/span&gt; 
&lt;span class="k"&gt;ROW&lt;/span&gt;
+&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; 
&lt;span class="n"&gt;max_c1&lt;/span&gt;
+&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span 
class="n"&gt;c1_table&lt;/span&gt;
+&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; 
&lt;span class="n"&gt;max_c1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;In earlier versions of DataFusion, this query would require an 
additional SortExec on max_c1 to satisfy the ORDER BY clause. However, with the 
new set-monotonicity logic, the planner knows that MAX(...) OVER (...) produces 
values that are not smaller than the previous row, making the extra sort 
redundant. This leads to more efficient query execution.&lt;/p&gt;
+&lt;p&gt;PR &lt;a 
href="https://github.com/apache/datafusion/pull/14271"&gt;#14271&lt;/a&gt; 
introduced the core monotonicity tracking for aggregates and window functions.
+PR &lt;a 
href="https://github.com/apache/datafusion/pull/14813"&gt;#14813&lt;/a&gt; 
improved ordering preservation within various window frame types, and brought 
an extensive test coverage.
+Huge thanks to &lt;a 
href="https://github.com/berkaysynnada"&gt;@berkaysynnada&lt;/a&gt; and &lt;a 
href="https://github.com/mertak-synnada"&gt;@mertak-synnada&lt;/a&gt; for 
designing and implementing this optimizer enhancement!&lt;/p&gt;
+&lt;h3&gt;UNION [ALL | DISTINCT] BY NAME Support&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports UNION BY NAME and UNION ALL BY NAME, which 
align columns by name instead of position. This matches functionality found in 
systems like Spark and DuckDB and simplifies combining heterogeneously ordered 
result sets.&lt;/p&gt;
+&lt;p&gt;You no longer need to rewrite column order manually &amp;mdash; just 
write:&lt;/p&gt;
+&lt;div 
class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span 
class="k"&gt;SELECT&lt;/span&gt; &lt;span 
class="n"&gt;col1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span 
class="n"&gt;col2&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span 
class="n"&gt;t1&lt;/span&gt;
+&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span 
class="k"&gt;ALL&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span 
class="n"&gt;NAME&lt;/span&gt;
+&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span 
class="n"&gt;col2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span 
class="n"&gt;col1&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span 
class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;Under the hood, this is supported by the new union_by_name() and 
union_by_name_distinct() plan builder methods.&lt;/p&gt;
+&lt;p&gt;Thanks to &lt;a 
href="https://github.com/rkrishn7"&gt;@rkrishn7&lt;/a&gt; for PR &lt;a 
href="https://github.com/apache/datafusion/pull/14538"&gt;#14538&lt;/a&gt;.&lt;/p&gt;
+&lt;h3&gt;New range() Table Function&lt;/h3&gt;
+&lt;p&gt;A new table-valued function range(start, stop, step) has been added 
to make it easy to generate integer sequences &amp;mdash; similar to 
PostgreSQL&amp;rsquo;s generate_series() or Spark&amp;rsquo;s range().&lt;/p&gt;
+&lt;p&gt;Example:&lt;/p&gt;
+&lt;div 
class="codehilite"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span 
class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span 
class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;range&lt;/span&gt;&lt;span 
class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span 
class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span 
class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span 
class="p"&gt;);&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
+&lt;p&gt;This returns: 1, 3, 5, 7, 9. It&amp;rsquo;s great for testing, cross 
joins, surrogate keys, and more.&lt;/p&gt;
+&lt;p&gt;Thanks to &lt;a 
href="https://github.com/simonvandel"&gt;@simonvandel&lt;/a&gt; for PR &lt;a 
href="https://github.com/apache/datafusion/pull/14830"&gt;#14830&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;Upgrade Guide and Changelog&lt;/h2&gt;
+&lt;p&gt;Upgrading to 46.0.0 should be straightforward for most users, but do 
review the&amp;nbsp;&lt;a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade
 Guide for DataFusion 46.0.0&lt;/a&gt;&amp;nbsp;for detailed steps and code 
changes. The upgrade guide covers the breaking changes mentioned (like 
replacing old exec nodes with&amp;nbsp;&lt;code&gt;DataSourceExec&lt;/code&gt;, 
updating UDF invocation to&amp;nbsp;&lt;code&gt;invoke_with_args&lt;/code&gt;, 
e [...]
+&lt;h2&gt;Get Involved&lt;/h2&gt;
+&lt;p&gt;Apache DataFusion is an open-source project, and we welcome 
involvement from anyone interested. Now is a great time to take 46.0.0 for a 
spin: try it out on your workloads, and let us know if you encounter any issues 
or have suggestions. You can report bugs or request features on 
our&amp;nbsp;GitHub issue tracker, or better yet, submit a pull request. Join 
our community discussions &amp;ndash; whether you have questions, want to share 
how you&amp;rsquo;re using DataFusion, or ar [...]
+&lt;p&gt;Happy querying!&lt;/p&gt;</content><category 
term="blog"></category></entry></feed>
\ No newline at end of file
diff --git 
a/output/feeds/oznur-hanci-and-berkay-sahin-on-behalf-of-the-pmc.rss.xml 
b/output/feeds/oznur-hanci-and-berkay-sahin-on-behalf-of-the-pmc.rss.xml
new file mode 100644
index 0000000..177a08c
--- /dev/null
+++ b/output/feeds/oznur-hanci-and-berkay-sahin-on-behalf-of-the-pmc.rss.xml
@@ -0,0 +1,18 @@
+<?xml version="1.0" encoding="utf-8"?>
+<rss version="2.0"><channel><title>Apache DataFusion Blog - Oznur Hanci and 
Berkay Sahin on behalf of the 
PMC</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
 24 Mar 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion 
46.0.0 
Released</title><link>https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0</link><description>&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;p&gt;We&amp;rsquo;re excited to announce the release 
of&amp;nbsp;&lt;strong&gt;Apache DataFusion 46.0.0&lt;/strong&gt;! This new 
version represents a significant milestone for the project, packing in a wide 
range of improvements and fixes. You can find the complete details in 
the&amp;nbsp;full &lt;a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md"&gt;changelog&lt;/a&gt;.
 We&amp;rsquo;ll highlight the most important changes below 
…&lt;/p&gt;</descript [...]
\ No newline at end of file
diff --git a/output/images/datafusion-46.0.0/diagnostic-example.png 
b/output/images/datafusion-46.0.0/diagnostic-example.png
new file mode 100644
index 0000000..0dc3df5
Binary files /dev/null and 
b/output/images/datafusion-46.0.0/diagnostic-example.png differ
diff --git a/output/index.html b/output/index.html
index ce50f7d..306c43c 100644
--- a/output/index.html
+++ b/output/index.html
@@ -44,6 +44,41 @@
             <p><i>Here you can find the latest updates from DataFusion and 
related projects.</i></p>
 
 
+    <!-- Post -->
+    <div class="row">
+        <div class="callout">
+            <article class="post">
+                <header>
+                    <div class="title">
+                        <h1><a 
href="/blog/2025/03/24/datafusion-46.0.0">Apache DataFusion 46.0.0 
Released</a></h1>
+                        <p>Posted on: Mon 24 March 2025 by Oznur Hanci and 
Berkay Sahin on behalf of the PMC</p>
+                        <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>We&rsquo;re excited to announce the release of&nbsp;<strong>Apache 
DataFusion 46.0.0</strong>! This new version represents a significant milestone 
for the project, packing in a wide range of improvements and fixes. You can 
find the complete details in the&nbsp;full <a 
href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md";>changelog</a>.
 We&rsquo;ll highlight the most important changes below …</p></p>
+                        <footer>
+                            <ul class="actions">
+                                <div style="text-align: right"><a 
href="/blog/2025/03/24/datafusion-46.0.0" class="button medium">Continue 
Reading</a></div>
+                            </ul>
+                            <ul class="stats">
+                            </ul>
+                        </footer>
+            </article>
+        </div>
+    </div>
     <!-- Post -->
     <div class="row">
         <div class="callout">


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to