nuno-faria commented on code in PR #135:
URL: https://github.com/apache/datafusion-site/pull/135#discussion_r2722875997


##########
content/blog/2026-01-08-datafusion-52.0.0.md:
##########
@@ -0,0 +1,379 @@
+---
+layout: post
+title: Apache DataFusion 52.0.0 Released
+date: 2026-01-08
+author: pmc
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+We are proud to announce the release of [DataFusion 52.0.0]. This post 
highlights
+some of the major improvements since [DataFusion 51.0.0]. The complete list of
+changes is available in the [changelog]. Thanks to the [121 contributors] for
+making this release possible.
+
+TODO: confirm the release date for 52.0.0 and update the front matter if 
needed.
+
+[DataFusion 52.0.0]: https://crates.io/crates/datafusion/52.0.0
+[DataFusion 51.0.0]: 
https://datafusion.apache.org/blog/2025/11/25/datafusion-51.0.0/
+[changelog]: 
https://github.com/apache/datafusion/blob/branch-52/dev/changelog/52.0.0.md
+[121 contributors]: 
https://github.com/apache/datafusion/blob/branch-52/dev/changelog/52.0.0.md#credits
+
+## Performance Improvements 🚀
+
+We continue to make significant performance improvements in DataFusion as 
explained below.
+
+### Faster `CASE` Expressions
+
+DataFusion 52 has lookup-table-based evaluation for certain `CASE` expressions
+to avoid repeated evaluation for accelerating common ETL patterns such as
+
+```sql
+CASE company
+    WHEN 1 THEN 'Apple'
+    WHEN 5 THEN 'Samsung'
+    WHEN 2 THEN 'Motorola'
+    WHEN 3 THEN 'LG'
+    ELSE 'Other'
+END
+```
+
+This is the final work in our `CASE` performance epic ([#18075]), which has
+improved `CASE` evaluation significantly. Related PRs [#18183]. Thanks to
+[rluvaton] and [pepijnve] for the implementation.
+
+[rluvaton]: https://github.com/rluvaton
+[pepijnve]: https://github.com/pepijnve
+
+
+[#18075]: https://github.com/apache/datafusion/issues/18075
+[#18183]: https://github.com/apache/datafusion/pull/18183
+
+### New Merge Join
+
+DataFusion 52 includes a rewrite of the sort-merge join (SMJ) operator, with
+speedups of three orders of magnitude in some pathological cases such as the
+case in [#18487], which also affected [Apache Comet] workloads. Benchmarks in
+[#18875] show dramatic gains for TPC-H Q21 (minutes to milliseconds) while
+leaving other queries unchanged or modestly faster. Thanks to [mbutrovich] for
+the implementation and reviews from [Dandandan].
+
+[#18487]: https://github.com/apache/datafusion/issues/18487
+[#18875]: https://github.com/apache/datafusion/pull/18875
+[Apache Comet]: https://datafusion.apache.org/comet/
+[mbutrovich]: https://github.com/mbutrovich
+
+### Rewritten merge join
+
+DataFusion 52 includes a rewrite of the sort-merge join (SMJ) output buffering 
to
+avoid excessive `concat_batches` work and to use `BatchCoalescer` internally 
and
+for final output. This change targets pathological slowdowns like the reported
+LeftAnti join case in [#18487], which also affected Comet workloads that rely 
on
+SMJ. Benchmarks in [#18875] show dramatic gains for TPC-H Q21 (moving from
+minutes to milliseconds) while leaving most other queries unchanged or modestly
+faster, and the update is fully internal with no user-facing API changes.
+
+
+### Caching Improvements
+
+This release also includes several additional caching improvements.
+
+A new statistics cache for Parquet Metadata avoids repeatedly (re)calculating

Review Comment:
   nit: maybe "Parquet Metadata" -> "File Metadata"? Since there is also a 
separate cache for the Parquet metadata itself.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to