ozankabak commented on code in PR #64:
URL: https://github.com/apache/datafusion-site/pull/64#discussion_r2010334544


##########
content/blog/2025-03-24-datafusion-46.0.0.md:
##########
@@ -0,0 +1,92 @@
+---
+layout: post
+title: Apache DataFusion 46.0.0 Released
+date: 2025-03-24
+author: oznur-synnada and berkaysynnada on behalf of PMC

Review Comment:
   ```suggestion
   author: Oznur Hanci and Berkay Sahin on behalf of the PMC
   ```



##########
content/blog/2025-03-24-datafusion-46.0.0.md:
##########
@@ -0,0 +1,92 @@
+---
+layout: post
+title: Apache DataFusion 46.0.0 Released
+date: 2025-03-24
+author: oznur-synnada and berkaysynnada on behalf of PMC
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+We’re excited to announce the release of **Apache DataFusion 46.0.0**! This 
new version represents a significant milestone for the project, packing in a 
wide range of improvements and fixes. You can find the complete details in the 
full 
[changelog](https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md).
 We’ll highlight the most important changes below and guide you through 
upgrading.
+
+## Breaking Changes
+
+DataFusion 46.0.0 brings a few **breaking changes** that may require 
adjustments to your code:
+
+- [Unified DataSourceExec Execution 
Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, 
`JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single** 
`DataSourceExec` plan. Format-specific logic is now encapsulated in new 
`DataSource` and `FileSource` traits. This change simplifies the execution 
model, but if you have code that directly references the old plan nodes, you’ll 
need to update it to use `DataSourceExec` (see the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for 
examples of the new API).
+- [**Error Handling 
Improvements](https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2) 
(`DataFusionError::Collection`):** We have begun overhauling DataFusion’s error 
handling. In this release, a new error variant `DataFusionError::Collection` 
(and related mechanisms) has been introduced to aggregate multiple errors into 
one. This is part of a broader effort to provide richer error context and 
reduce internal panics. As a result, some error types or messages have changed. 
Downstream code that matches on specific `DataFusionError` variants might need 
adjustment.
+
+## Highlighted New Features
+
+### Improved Diagnostics
+
+DataFusion 46.0.0 introduces a new [**SQL Diagnostics 
framework**](https://github.com/apache/datafusion/issues/14429) to make error 
messages more understandable. This comes in the form of new `Diagnostic` and 
`DiagnosticEntry` types, which allow the system to attach rich context (like 
source query text spans) to error messages. In practical terms, certain planner 
errors will now point to the exact location in your SQL query that caused the 
issue. 
+
+For example, if you reference an unknown table or miss a column in `GROUP BY` 
the error message will include the query snippet causing the error. These 
diagnostics are meant for end-users of applications built on DataFusion, 
providing clearer messages instead of generic errors. Currently, diagnostics 
cover unresolved table/column references, missing `GROUP BY`columns, ambiguous 
references, wrong number of UNION columns, type mismatches, and a few others. 
Future releases will extend this to more error types. This feature should 
greatly ease debugging of complex SQL by pinpointing errors directly in the 
query text. Thanks to [@eliaperantoni](https://github.com/eliaperantoni).
+
+### Unified `DataSourceExec` for Table Providers
+
+As mentioned, DataFusion now uses a unified `DataSourceExec` for reading 
tables, which is both a breaking change and a feature. **Why is this 
exciting?** Because it simplifies how custom table providers are integrated and 
optimized. With one execution plan to handle all data sources, the optimizer 
can treat file scans uniformly and push down filters/limits more consistently. 
The new `DataSourceExec` is paired with a `DataSource` trait that encapsulates 
format-specific behaviors (Parquet, CSV, JSON, Avro, etc.) in a pluggable way.
+
+All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have been 
migrated to this framework. This unification makes the codebase cleaner and 
sets the stage for future enhancements (like consistent metadata handling and 
limit pushdown across all formats). Check out PR 
[#14224](https://github.com/apache/datafusion/pull/14224) for design details, 
thanks to [@mertak-synnada](https://github.com/mertak-synnada) and 
[@ozankabak](https://github.com/ozankabak).
+
+### FFI Support for Scalar UDFs
+
+DataFusion’s Foreign Function Interface (FFI) has been extended to support 
[**user-defined scalar 
functions**](https://github.com/apache/datafusion/pull/14579) defined in 
external languages. In 46.0.0, you can now expose a custom scalar UDF through 
the FFI layer and use it in DataFusion as if it were built-in. This is 
particularly exciting for the **Python bindings** and other language 
integrations – it means you could define a function in Python (or C, etc.) and 
register it with DataFusion’s Rust core via the FFI crate. Thanks, 
[@timsaucer](https://github.com/timsaucer)!
+
+### Distribution Framework
+
+This release, thanks mainly to [@Fly-Style](https://github.com/Fly-Style) with 
contributions from [@ozankabak](https://github.com/ozankabak) and 
[@berkaysynnada](https://github.com/berkaysynnada), includes the initial pieces 
of a [**redesigned statistics 
framework](https://github.com/apache/datafusion/pull/14699).** DataFusion’s 
optimizer can now represent column data distributions using a new 
`Distribution` enum, instead of the old precision or range estimations. The 
supported distribution types currently include **Uniform, Gaussian (normal), 
Exponential, Bernoulli**, and an **Unknown** catch-all.

Review Comment:
   ```suggestion
   This release, thanks mainly to [@Fly-Style](https://github.com/Fly-Style) 
with contributions from [@ozankabak](https://github.com/ozankabak) and 
[@berkaysynnada](https://github.com/berkaysynnada), includes the initial pieces 
of a [**redesigned statistics 
framework](https://github.com/apache/datafusion/pull/14699).** DataFusion’s 
optimizer can now represent column data distributions using a new 
`Distribution` enum, instead of the old precision or range estimations. The 
supported distribution types currently include **Uniform, Gaussian (normal), 
Exponential, Bernoulli**, and an **Generic** catch-all.
   ```



##########
content/blog/2025-03-24-datafusion-46.0.0.md:
##########
@@ -0,0 +1,92 @@
+---
+layout: post
+title: Apache DataFusion 46.0.0 Released
+date: 2025-03-24
+author: oznur-synnada and berkaysynnada on behalf of PMC
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+We’re excited to announce the release of **Apache DataFusion 46.0.0**! This 
new version represents a significant milestone for the project, packing in a 
wide range of improvements and fixes. You can find the complete details in the 
full 
[changelog](https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md).
 We’ll highlight the most important changes below and guide you through 
upgrading.
+
+## Breaking Changes
+
+DataFusion 46.0.0 brings a few **breaking changes** that may require 
adjustments to your code:
+
+- [Unified DataSourceExec Execution 
Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, 
`JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single** 
`DataSourceExec` plan. Format-specific logic is now encapsulated in new 
`DataSource` and `FileSource` traits. This change simplifies the execution 
model, but if you have code that directly references the old plan nodes, you’ll 
need to update it to use `DataSourceExec` (see the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for 
examples of the new API).
+- [**Error Handling 
Improvements](https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2) 
(`DataFusionError::Collection`):** We have begun overhauling DataFusion’s error 
handling. In this release, a new error variant `DataFusionError::Collection` 
(and related mechanisms) has been introduced to aggregate multiple errors into 
one. This is part of a broader effort to provide richer error context and 
reduce internal panics. As a result, some error types or messages have changed. 
Downstream code that matches on specific `DataFusionError` variants might need 
adjustment.
+
+## Highlighted New Features
+
+### Improved Diagnostics
+
+DataFusion 46.0.0 introduces a new [**SQL Diagnostics 
framework**](https://github.com/apache/datafusion/issues/14429) to make error 
messages more understandable. This comes in the form of new `Diagnostic` and 
`DiagnosticEntry` types, which allow the system to attach rich context (like 
source query text spans) to error messages. In practical terms, certain planner 
errors will now point to the exact location in your SQL query that caused the 
issue. 
+
+For example, if you reference an unknown table or miss a column in `GROUP BY` 
the error message will include the query snippet causing the error. These 
diagnostics are meant for end-users of applications built on DataFusion, 
providing clearer messages instead of generic errors. Currently, diagnostics 
cover unresolved table/column references, missing `GROUP BY`columns, ambiguous 
references, wrong number of UNION columns, type mismatches, and a few others. 
Future releases will extend this to more error types. This feature should 
greatly ease debugging of complex SQL by pinpointing errors directly in the 
query text. Thanks to [@eliaperantoni](https://github.com/eliaperantoni).

Review Comment:
   ```suggestion
   For example, if you reference an unknown table or miss a column in `GROUP 
BY` the error message will include the query snippet causing the error. These 
diagnostics are meant for end-users of applications built on DataFusion, 
providing clearer messages instead of generic errors. Currently, diagnostics 
cover unresolved table/column references, missing `GROUP BY`columns, ambiguous 
references, wrong number of UNION columns, type mismatches, and a few others. 
Future releases will extend this to more error types. This feature should 
greatly ease debugging of complex SQL by pinpointing errors directly in the 
query text. We thanks [@eliaperantoni](https://github.com/eliaperantoni) for 
his contributions in this project.
   ```



##########
content/blog/2025-03-24-datafusion-46.0.0.md:
##########
@@ -0,0 +1,92 @@
+---
+layout: post
+title: Apache DataFusion 46.0.0 Released
+date: 2025-03-24
+author: oznur-synnada and berkaysynnada on behalf of PMC
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+We’re excited to announce the release of **Apache DataFusion 46.0.0**! This 
new version represents a significant milestone for the project, packing in a 
wide range of improvements and fixes. You can find the complete details in the 
full 
[changelog](https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md).
 We’ll highlight the most important changes below and guide you through 
upgrading.
+
+## Breaking Changes
+
+DataFusion 46.0.0 brings a few **breaking changes** that may require 
adjustments to your code:
+
+- [Unified DataSourceExec Execution 
Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, 
`JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single** 
`DataSourceExec` plan. Format-specific logic is now encapsulated in new 
`DataSource` and `FileSource` traits. This change simplifies the execution 
model, but if you have code that directly references the old plan nodes, you’ll 
need to update it to use `DataSourceExec` (see the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for 
examples of the new API).
+- [**Error Handling 
Improvements](https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2) 
(`DataFusionError::Collection`):** We have begun overhauling DataFusion’s error 
handling. In this release, a new error variant `DataFusionError::Collection` 
(and related mechanisms) has been introduced to aggregate multiple errors into 
one. This is part of a broader effort to provide richer error context and 
reduce internal panics. As a result, some error types or messages have changed. 
Downstream code that matches on specific `DataFusionError` variants might need 
adjustment.
+
+## Highlighted New Features
+
+### Improved Diagnostics
+
+DataFusion 46.0.0 introduces a new [**SQL Diagnostics 
framework**](https://github.com/apache/datafusion/issues/14429) to make error 
messages more understandable. This comes in the form of new `Diagnostic` and 
`DiagnosticEntry` types, which allow the system to attach rich context (like 
source query text spans) to error messages. In practical terms, certain planner 
errors will now point to the exact location in your SQL query that caused the 
issue. 
+
+For example, if you reference an unknown table or miss a column in `GROUP BY` 
the error message will include the query snippet causing the error. These 
diagnostics are meant for end-users of applications built on DataFusion, 
providing clearer messages instead of generic errors. Currently, diagnostics 
cover unresolved table/column references, missing `GROUP BY`columns, ambiguous 
references, wrong number of UNION columns, type mismatches, and a few others. 
Future releases will extend this to more error types. This feature should 
greatly ease debugging of complex SQL by pinpointing errors directly in the 
query text. Thanks to [@eliaperantoni](https://github.com/eliaperantoni).
+
+### Unified `DataSourceExec` for Table Providers
+
+As mentioned, DataFusion now uses a unified `DataSourceExec` for reading 
tables, which is both a breaking change and a feature. **Why is this 
exciting?** Because it simplifies how custom table providers are integrated and 
optimized. With one execution plan to handle all data sources, the optimizer 
can treat file scans uniformly and push down filters/limits more consistently. 
The new `DataSourceExec` is paired with a `DataSource` trait that encapsulates 
format-specific behaviors (Parquet, CSV, JSON, Avro, etc.) in a pluggable way.
+
+All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have been 
migrated to this framework. This unification makes the codebase cleaner and 
sets the stage for future enhancements (like consistent metadata handling and 
limit pushdown across all formats). Check out PR 
[#14224](https://github.com/apache/datafusion/pull/14224) for design details, 
thanks to [@mertak-synnada](https://github.com/mertak-synnada) and 
[@ozankabak](https://github.com/ozankabak).
+
+### FFI Support for Scalar UDFs
+
+DataFusion’s Foreign Function Interface (FFI) has been extended to support 
[**user-defined scalar 
functions**](https://github.com/apache/datafusion/pull/14579) defined in 
external languages. In 46.0.0, you can now expose a custom scalar UDF through 
the FFI layer and use it in DataFusion as if it were built-in. This is 
particularly exciting for the **Python bindings** and other language 
integrations – it means you could define a function in Python (or C, etc.) and 
register it with DataFusion’s Rust core via the FFI crate. Thanks, 
[@timsaucer](https://github.com/timsaucer)!
+
+### Distribution Framework
+
+This release, thanks mainly to [@Fly-Style](https://github.com/Fly-Style) with 
contributions from [@ozankabak](https://github.com/ozankabak) and 
[@berkaysynnada](https://github.com/berkaysynnada), includes the initial pieces 
of a [**redesigned statistics 
framework](https://github.com/apache/datafusion/pull/14699).** DataFusion’s 
optimizer can now represent column data distributions using a new 
`Distribution` enum, instead of the old precision or range estimations. The 
supported distribution types currently include **Uniform, Gaussian (normal), 
Exponential, Bernoulli**, and an **Unknown** catch-all.
+
+For example, if a filter expression is applied to a column with a known 
uniform distribution range, the optimizer can propagate that to estimate result 
selectivity more accurately. Similarly, comparisons (=, >, etc.) on columns 
yield Bernoulli distributions (true/false probabilities) in this model
+
+This is a foundational change: the immediate user-visible effect is limited 
(the optimizer won’t yet magically become a genius), but it lays groundwork for 
more advanced query planning in the future. Over time, as these 
`Distribution`'s get integrated, DataFusion will be able to make smarter 
decisions like more aggressive parquet pruning, better join orderings, and so 
on based on data distribution assumptions. Note that for now, the framework is 
in place and being hooked up (with basic support for propagation of Uniform and 
handling of some combos), but it’s not fully exploited yet. Think of this as an 
investment in performance improvements down the line.

Review Comment:
   ```suggestion
   This is a foundational change with many follow-on PRs underway. Even though 
the immediate user-visible effect is limited (the optimizer didn't magically 
improve by an order of magnitude overnight), but it lays groundwork for more 
advanced query planning in the future. Over time, as statistics information 
encapsulated in `Distribution`s get integrated, DataFusion will be able to make 
smarter decisions like more aggressive parquet pruning, better join orderings, 
and so on based on data distribution information. The core framework is now in 
place and is being hooked up to column and table level statistics.
   ```



##########
content/blog/2025-03-24-datafusion-46.0.0.md:
##########
@@ -0,0 +1,92 @@
+---
+layout: post
+title: Apache DataFusion 46.0.0 Released
+date: 2025-03-24
+author: oznur-synnada and berkaysynnada on behalf of PMC
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+We’re excited to announce the release of **Apache DataFusion 46.0.0**! This 
new version represents a significant milestone for the project, packing in a 
wide range of improvements and fixes. You can find the complete details in the 
full 
[changelog](https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md).
 We’ll highlight the most important changes below and guide you through 
upgrading.
+
+## Breaking Changes
+
+DataFusion 46.0.0 brings a few **breaking changes** that may require 
adjustments to your code:
+
+- [Unified DataSourceExec Execution 
Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, 
`JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single** 
`DataSourceExec` plan. Format-specific logic is now encapsulated in new 
`DataSource` and `FileSource` traits. This change simplifies the execution 
model, but if you have code that directly references the old plan nodes, you’ll 
need to update it to use `DataSourceExec` (see the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for 
examples of the new API).
+- [**Error Handling 
Improvements](https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2) 
(`DataFusionError::Collection`):** We have begun overhauling DataFusion’s error 
handling. In this release, a new error variant `DataFusionError::Collection` 
(and related mechanisms) has been introduced to aggregate multiple errors into 
one. This is part of a broader effort to provide richer error context and 
reduce internal panics. As a result, some error types or messages have changed. 
Downstream code that matches on specific `DataFusionError` variants might need 
adjustment.
+
+## Highlighted New Features
+
+### Improved Diagnostics
+
+DataFusion 46.0.0 introduces a new [**SQL Diagnostics 
framework**](https://github.com/apache/datafusion/issues/14429) to make error 
messages more understandable. This comes in the form of new `Diagnostic` and 
`DiagnosticEntry` types, which allow the system to attach rich context (like 
source query text spans) to error messages. In practical terms, certain planner 
errors will now point to the exact location in your SQL query that caused the 
issue. 
+
+For example, if you reference an unknown table or miss a column in `GROUP BY` 
the error message will include the query snippet causing the error. These 
diagnostics are meant for end-users of applications built on DataFusion, 
providing clearer messages instead of generic errors. Currently, diagnostics 
cover unresolved table/column references, missing `GROUP BY`columns, ambiguous 
references, wrong number of UNION columns, type mismatches, and a few others. 
Future releases will extend this to more error types. This feature should 
greatly ease debugging of complex SQL by pinpointing errors directly in the 
query text. Thanks to [@eliaperantoni](https://github.com/eliaperantoni).
+
+### Unified `DataSourceExec` for Table Providers
+
+As mentioned, DataFusion now uses a unified `DataSourceExec` for reading 
tables, which is both a breaking change and a feature. **Why is this 
exciting?** Because it simplifies how custom table providers are integrated and 
optimized. With one execution plan to handle all data sources, the optimizer 
can treat file scans uniformly and push down filters/limits more consistently. 
The new `DataSourceExec` is paired with a `DataSource` trait that encapsulates 
format-specific behaviors (Parquet, CSV, JSON, Avro, etc.) in a pluggable way.
+
+All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have been 
migrated to this framework. This unification makes the codebase cleaner and 
sets the stage for future enhancements (like consistent metadata handling and 
limit pushdown across all formats). Check out PR 
[#14224](https://github.com/apache/datafusion/pull/14224) for design details, 
thanks to [@mertak-synnada](https://github.com/mertak-synnada) and 
[@ozankabak](https://github.com/ozankabak).
+
+### FFI Support for Scalar UDFs
+
+DataFusion’s Foreign Function Interface (FFI) has been extended to support 
[**user-defined scalar 
functions**](https://github.com/apache/datafusion/pull/14579) defined in 
external languages. In 46.0.0, you can now expose a custom scalar UDF through 
the FFI layer and use it in DataFusion as if it were built-in. This is 
particularly exciting for the **Python bindings** and other language 
integrations – it means you could define a function in Python (or C, etc.) and 
register it with DataFusion’s Rust core via the FFI crate. Thanks, 
[@timsaucer](https://github.com/timsaucer)!
+
+### Distribution Framework
+
+This release, thanks mainly to [@Fly-Style](https://github.com/Fly-Style) with 
contributions from [@ozankabak](https://github.com/ozankabak) and 
[@berkaysynnada](https://github.com/berkaysynnada), includes the initial pieces 
of a [**redesigned statistics 
framework](https://github.com/apache/datafusion/pull/14699).** DataFusion’s 
optimizer can now represent column data distributions using a new 
`Distribution` enum, instead of the old precision or range estimations. The 
supported distribution types currently include **Uniform, Gaussian (normal), 
Exponential, Bernoulli**, and an **Unknown** catch-all.
+
+For example, if a filter expression is applied to a column with a known 
uniform distribution range, the optimizer can propagate that to estimate result 
selectivity more accurately. Similarly, comparisons (=, >, etc.) on columns 
yield Bernoulli distributions (true/false probabilities) in this model

Review Comment:
   ```suggestion
   For example, if a filter expression is applied to a column with a known 
uniform distribution range, the optimizer can propagate that to estimate result 
selectivity more accurately. Similarly, comparisons (`=`, `>`, etc.) on columns 
yield Bernoulli distributions (with true/false probabilities) in this model.
   ```



##########
content/blog/2025-03-24-datafusion-46.0.0.md:
##########
@@ -0,0 +1,92 @@
+---
+layout: post
+title: Apache DataFusion 46.0.0 Released
+date: 2025-03-24
+author: oznur-synnada and berkaysynnada on behalf of PMC
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+We’re excited to announce the release of **Apache DataFusion 46.0.0**! This 
new version represents a significant milestone for the project, packing in a 
wide range of improvements and fixes. You can find the complete details in the 
full 
[changelog](https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md).
 We’ll highlight the most important changes below and guide you through 
upgrading.
+
+## Breaking Changes
+
+DataFusion 46.0.0 brings a few **breaking changes** that may require 
adjustments to your code:
+
+- [Unified DataSourceExec Execution 
Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, 
`JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single** 
`DataSourceExec` plan. Format-specific logic is now encapsulated in new 
`DataSource` and `FileSource` traits. This change simplifies the execution 
model, but if you have code that directly references the old plan nodes, you’ll 
need to update it to use `DataSourceExec` (see the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for 
examples of the new API).

Review Comment:
   ```suggestion
   - [Unified `DataSourceExec` Execution 
Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, 
`JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single 
`DataSourceExec` plan**. Format-specific logic is now encapsulated in new 
`DataSource` and `FileSource` traits. This change simplifies the execution 
model, but if you have code that directly references the old plan nodes, you’ll 
need to update it to use `DataSourceExec` (see the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for 
examples of the new API).
   ```



##########
content/blog/2025-03-24-datafusion-46.0.0.md:
##########
@@ -0,0 +1,92 @@
+---
+layout: post
+title: Apache DataFusion 46.0.0 Released
+date: 2025-03-24
+author: oznur-synnada and berkaysynnada on behalf of PMC
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+We’re excited to announce the release of **Apache DataFusion 46.0.0**! This 
new version represents a significant milestone for the project, packing in a 
wide range of improvements and fixes. You can find the complete details in the 
full 
[changelog](https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md).
 We’ll highlight the most important changes below and guide you through 
upgrading.
+
+## Breaking Changes
+
+DataFusion 46.0.0 brings a few **breaking changes** that may require 
adjustments to your code:
+
+- [Unified DataSourceExec Execution 
Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, 
`JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single** 
`DataSourceExec` plan. Format-specific logic is now encapsulated in new 
`DataSource` and `FileSource` traits. This change simplifies the execution 
model, but if you have code that directly references the old plan nodes, you’ll 
need to update it to use `DataSourceExec` (see the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for 
examples of the new API).
+- [**Error Handling 
Improvements](https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2) 
(`DataFusionError::Collection`):** We have begun overhauling DataFusion’s error 
handling. In this release, a new error variant `DataFusionError::Collection` 
(and related mechanisms) has been introduced to aggregate multiple errors into 
one. This is part of a broader effort to provide richer error context and 
reduce internal panics. As a result, some error types or messages have changed. 
Downstream code that matches on specific `DataFusionError` variants might need 
adjustment.
+
+## Highlighted New Features
+
+### Improved Diagnostics
+
+DataFusion 46.0.0 introduces a new [**SQL Diagnostics 
framework**](https://github.com/apache/datafusion/issues/14429) to make error 
messages more understandable. This comes in the form of new `Diagnostic` and 
`DiagnosticEntry` types, which allow the system to attach rich context (like 
source query text spans) to error messages. In practical terms, certain planner 
errors will now point to the exact location in your SQL query that caused the 
issue. 
+
+For example, if you reference an unknown table or miss a column in `GROUP BY` 
the error message will include the query snippet causing the error. These 
diagnostics are meant for end-users of applications built on DataFusion, 
providing clearer messages instead of generic errors. Currently, diagnostics 
cover unresolved table/column references, missing `GROUP BY`columns, ambiguous 
references, wrong number of UNION columns, type mismatches, and a few others. 
Future releases will extend this to more error types. This feature should 
greatly ease debugging of complex SQL by pinpointing errors directly in the 
query text. Thanks to [@eliaperantoni](https://github.com/eliaperantoni).
+
+### Unified `DataSourceExec` for Table Providers
+
+As mentioned, DataFusion now uses a unified `DataSourceExec` for reading 
tables, which is both a breaking change and a feature. **Why is this 
exciting?** Because it simplifies how custom table providers are integrated and 
optimized. With one execution plan to handle all data sources, the optimizer 
can treat file scans uniformly and push down filters/limits more consistently. 
The new `DataSourceExec` is paired with a `DataSource` trait that encapsulates 
format-specific behaviors (Parquet, CSV, JSON, Avro, etc.) in a pluggable way.
+
+All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have been 
migrated to this framework. This unification makes the codebase cleaner and 
sets the stage for future enhancements (like consistent metadata handling and 
limit pushdown across all formats). Check out PR 
[#14224](https://github.com/apache/datafusion/pull/14224) for design details, 
thanks to [@mertak-synnada](https://github.com/mertak-synnada) and 
[@ozankabak](https://github.com/ozankabak).
+
+### FFI Support for Scalar UDFs
+
+DataFusion’s Foreign Function Interface (FFI) has been extended to support 
[**user-defined scalar 
functions**](https://github.com/apache/datafusion/pull/14579) defined in 
external languages. In 46.0.0, you can now expose a custom scalar UDF through 
the FFI layer and use it in DataFusion as if it were built-in. This is 
particularly exciting for the **Python bindings** and other language 
integrations – it means you could define a function in Python (or C, etc.) and 
register it with DataFusion’s Rust core via the FFI crate. Thanks, 
[@timsaucer](https://github.com/timsaucer)!
+
+### Distribution Framework

Review Comment:
   ```suggestion
   ### New Statistics/Distribution Framework
   ```



##########
content/blog/2025-03-24-datafusion-46.0.0.md:
##########
@@ -0,0 +1,92 @@
+---
+layout: post
+title: Apache DataFusion 46.0.0 Released
+date: 2025-03-24
+author: oznur-synnada and berkaysynnada on behalf of PMC
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+We’re excited to announce the release of **Apache DataFusion 46.0.0**! This 
new version represents a significant milestone for the project, packing in a 
wide range of improvements and fixes. You can find the complete details in the 
full 
[changelog](https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md).
 We’ll highlight the most important changes below and guide you through 
upgrading.
+
+## Breaking Changes
+
+DataFusion 46.0.0 brings a few **breaking changes** that may require 
adjustments to your code:
+
+- [Unified DataSourceExec Execution 
Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, 
`JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single** 
`DataSourceExec` plan. Format-specific logic is now encapsulated in new 
`DataSource` and `FileSource` traits. This change simplifies the execution 
model, but if you have code that directly references the old plan nodes, you’ll 
need to update it to use `DataSourceExec` (see the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for 
examples of the new API).
+- [**Error Handling 
Improvements](https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2) 
(`DataFusionError::Collection`):** We have begun overhauling DataFusion’s error 
handling. In this release, a new error variant `DataFusionError::Collection` 
(and related mechanisms) has been introduced to aggregate multiple errors into 
one. This is part of a broader effort to provide richer error context and 
reduce internal panics. As a result, some error types or messages have changed. 
Downstream code that matches on specific `DataFusionError` variants might need 
adjustment.
+
+## Highlighted New Features
+
+### Improved Diagnostics
+
+DataFusion 46.0.0 introduces a new [**SQL Diagnostics 
framework**](https://github.com/apache/datafusion/issues/14429) to make error 
messages more understandable. This comes in the form of new `Diagnostic` and 
`DiagnosticEntry` types, which allow the system to attach rich context (like 
source query text spans) to error messages. In practical terms, certain planner 
errors will now point to the exact location in your SQL query that caused the 
issue. 
+
+For example, if you reference an unknown table or miss a column in `GROUP BY` 
the error message will include the query snippet causing the error. These 
diagnostics are meant for end-users of applications built on DataFusion, 
providing clearer messages instead of generic errors. Currently, diagnostics 
cover unresolved table/column references, missing `GROUP BY`columns, ambiguous 
references, wrong number of UNION columns, type mismatches, and a few others. 
Future releases will extend this to more error types. This feature should 
greatly ease debugging of complex SQL by pinpointing errors directly in the 
query text. Thanks to [@eliaperantoni](https://github.com/eliaperantoni).
+
+### Unified `DataSourceExec` for Table Providers
+
+As mentioned, DataFusion now uses a unified `DataSourceExec` for reading 
tables, which is both a breaking change and a feature. **Why is this 
exciting?** Because it simplifies how custom table providers are integrated and 
optimized. With one execution plan to handle all data sources, the optimizer 
can treat file scans uniformly and push down filters/limits more consistently. 
The new `DataSourceExec` is paired with a `DataSource` trait that encapsulates 
format-specific behaviors (Parquet, CSV, JSON, Avro, etc.) in a pluggable way.

Review Comment:
   ```suggestion
   As mentioned, DataFusion now uses a unified `DataSourceExec` for reading 
tables, which is both a breaking change and a feature. *Why is this important?* 
The new approach simplifies how custom table providers are integrated and 
optimized. Namely, the optimizer can treat file scans uniformly and push down 
filters/limits more consistently when there is one execution plan that handles 
all data sources. The new `DataSourceExec` is paired with a `DataSource` trait 
that encapsulates format-specific behaviors (Parquet, CSV, JSON, Avro, etc.) in 
a pluggable way.
   ```



##########
content/blog/2025-03-24-datafusion-46.0.0.md:
##########
@@ -0,0 +1,92 @@
+---
+layout: post
+title: Apache DataFusion 46.0.0 Released
+date: 2025-03-24
+author: oznur-synnada and berkaysynnada on behalf of PMC
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+We’re excited to announce the release of **Apache DataFusion 46.0.0**! This 
new version represents a significant milestone for the project, packing in a 
wide range of improvements and fixes. You can find the complete details in the 
full 
[changelog](https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md).
 We’ll highlight the most important changes below and guide you through 
upgrading.
+
+## Breaking Changes
+
+DataFusion 46.0.0 brings a few **breaking changes** that may require 
adjustments to your code:
+
+- [Unified DataSourceExec Execution 
Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, 
`JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single** 
`DataSourceExec` plan. Format-specific logic is now encapsulated in new 
`DataSource` and `FileSource` traits. This change simplifies the execution 
model, but if you have code that directly references the old plan nodes, you’ll 
need to update it to use `DataSourceExec` (see the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for 
examples of the new API).
+- [**Error Handling 
Improvements](https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2) 
(`DataFusionError::Collection`):** We have begun overhauling DataFusion’s error 
handling. In this release, a new error variant `DataFusionError::Collection` 
(and related mechanisms) has been introduced to aggregate multiple errors into 
one. This is part of a broader effort to provide richer error context and 
reduce internal panics. As a result, some error types or messages have changed. 
Downstream code that matches on specific `DataFusionError` variants might need 
adjustment.

Review Comment:
   ```suggestion
   - [**Error Handling 
Improvements](https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2) 
(`DataFusionError::Collection`):** We began overhauling DataFusion’s approach 
to error handling. In this release, a new error variant 
`DataFusionError::Collection` (and related mechanisms) has been introduced to 
aggregate multiple errors into one. This is part of a broader effort to provide 
richer error context and reduce internal panics. As a result, some error types 
or messages have changed. Downstream code that matches on specific 
`DataFusionError` variants might need adjustment.
   ```



##########
content/blog/2025-03-24-datafusion-46.0.0.md:
##########
@@ -0,0 +1,92 @@
+---
+layout: post
+title: Apache DataFusion 46.0.0 Released
+date: 2025-03-24
+author: oznur-synnada and berkaysynnada on behalf of PMC
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+We’re excited to announce the release of **Apache DataFusion 46.0.0**! This 
new version represents a significant milestone for the project, packing in a 
wide range of improvements and fixes. You can find the complete details in the 
full 
[changelog](https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md).
 We’ll highlight the most important changes below and guide you through 
upgrading.
+
+## Breaking Changes
+
+DataFusion 46.0.0 brings a few **breaking changes** that may require 
adjustments to your code:
+
+- [Unified DataSourceExec Execution 
Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, 
`JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single** 
`DataSourceExec` plan. Format-specific logic is now encapsulated in new 
`DataSource` and `FileSource` traits. This change simplifies the execution 
model, but if you have code that directly references the old plan nodes, you’ll 
need to update it to use `DataSourceExec` (see the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for 
examples of the new API).
+- [**Error Handling 
Improvements](https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2) 
(`DataFusionError::Collection`):** We have begun overhauling DataFusion’s error 
handling. In this release, a new error variant `DataFusionError::Collection` 
(and related mechanisms) has been introduced to aggregate multiple errors into 
one. This is part of a broader effort to provide richer error context and 
reduce internal panics. As a result, some error types or messages have changed. 
Downstream code that matches on specific `DataFusionError` variants might need 
adjustment.
+
+## Highlighted New Features
+
+### Improved Diagnostics
+
+DataFusion 46.0.0 introduces a new [**SQL Diagnostics 
framework**](https://github.com/apache/datafusion/issues/14429) to make error 
messages more understandable. This comes in the form of new `Diagnostic` and 
`DiagnosticEntry` types, which allow the system to attach rich context (like 
source query text spans) to error messages. In practical terms, certain planner 
errors will now point to the exact location in your SQL query that caused the 
issue. 
+
+For example, if you reference an unknown table or miss a column in `GROUP BY` 
the error message will include the query snippet causing the error. These 
diagnostics are meant for end-users of applications built on DataFusion, 
providing clearer messages instead of generic errors. Currently, diagnostics 
cover unresolved table/column references, missing `GROUP BY`columns, ambiguous 
references, wrong number of UNION columns, type mismatches, and a few others. 
Future releases will extend this to more error types. This feature should 
greatly ease debugging of complex SQL by pinpointing errors directly in the 
query text. Thanks to [@eliaperantoni](https://github.com/eliaperantoni).
+
+### Unified `DataSourceExec` for Table Providers
+
+As mentioned, DataFusion now uses a unified `DataSourceExec` for reading 
tables, which is both a breaking change and a feature. **Why is this 
exciting?** Because it simplifies how custom table providers are integrated and 
optimized. With one execution plan to handle all data sources, the optimizer 
can treat file scans uniformly and push down filters/limits more consistently. 
The new `DataSourceExec` is paired with a `DataSource` trait that encapsulates 
format-specific behaviors (Parquet, CSV, JSON, Avro, etc.) in a pluggable way.
+
+All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have been 
migrated to this framework. This unification makes the codebase cleaner and 
sets the stage for future enhancements (like consistent metadata handling and 
limit pushdown across all formats). Check out PR 
[#14224](https://github.com/apache/datafusion/pull/14224) for design details, 
thanks to [@mertak-synnada](https://github.com/mertak-synnada) and 
[@ozankabak](https://github.com/ozankabak).
+
+### FFI Support for Scalar UDFs
+
+DataFusion’s Foreign Function Interface (FFI) has been extended to support 
[**user-defined scalar 
functions**](https://github.com/apache/datafusion/pull/14579) defined in 
external languages. In 46.0.0, you can now expose a custom scalar UDF through 
the FFI layer and use it in DataFusion as if it were built-in. This is 
particularly exciting for the **Python bindings** and other language 
integrations – it means you could define a function in Python (or C, etc.) and 
register it with DataFusion’s Rust core via the FFI crate. Thanks, 
[@timsaucer](https://github.com/timsaucer)!
+
+### Distribution Framework
+
+This release, thanks mainly to [@Fly-Style](https://github.com/Fly-Style) with 
contributions from [@ozankabak](https://github.com/ozankabak) and 
[@berkaysynnada](https://github.com/berkaysynnada), includes the initial pieces 
of a [**redesigned statistics 
framework](https://github.com/apache/datafusion/pull/14699).** DataFusion’s 
optimizer can now represent column data distributions using a new 
`Distribution` enum, instead of the old precision or range estimations. The 
supported distribution types currently include **Uniform, Gaussian (normal), 
Exponential, Bernoulli**, and an **Unknown** catch-all.
+
+For example, if a filter expression is applied to a column with a known 
uniform distribution range, the optimizer can propagate that to estimate result 
selectivity more accurately. Similarly, comparisons (=, >, etc.) on columns 
yield Bernoulli distributions (true/false probabilities) in this model
+
+This is a foundational change: the immediate user-visible effect is limited 
(the optimizer won’t yet magically become a genius), but it lays groundwork for 
more advanced query planning in the future. Over time, as these 
`Distribution`'s get integrated, DataFusion will be able to make smarter 
decisions like more aggressive parquet pruning, better join orderings, and so 
on based on data distribution assumptions. Note that for now, the framework is 
in place and being hooked up (with basic support for propagation of Uniform and 
handling of some combos), but it’s not fully exploited yet. Think of this as an 
investment in performance improvements down the line.
+
+### Aggregate Monotonicity and Window Ordering
+
+DataFusion 46.0.0 adds a new concept of 
[set](https://github.com/apache/datafusion/pull/14271#)[-monotonicity](https://github.com/apache/datafusion/blob/5210a2bac32e43dc7bf6e7e6000cdeaf2833c06e/datafusion/expr/src/udaf.rs#L1090)
 for certain transformations, which helps avoid unnecessary sort operations. In 
particular, the planner now understands when a **window function preserves 
ordering** of data (or when a sorted input remains sorted after the operation). 
For example, a window-aggregate like `MAX` on a column that is not ordered can 
be known to be still ordered. PR 
[#14271](https://github.com/apache/datafusion/pull/14271) introduced a 
“set-monotonicity” property for window functions, and a follow-up PR 
[#14813](https://github.com/apache/datafusion/pull/14813) refined the handling 
of sort order in window frames. Huge thanks to 
[@berkaysynnada](https://github.com/berkaysynnada) and 
[@mertak-synnada](https://github.com/mertak-synnada) for this feature.

Review Comment:
   ```suggestion
   DataFusion 46.0.0 adds a new concept of 
[set](https://github.com/apache/datafusion/pull/14271#)[-monotonicity](https://github.com/apache/datafusion/blob/5210a2bac32e43dc7bf6e7e6000cdeaf2833c06e/datafusion/expr/src/udaf.rs#L1090)
 for certain transformations, which helps avoid unnecessary sort operations. In 
particular, the planner now understands when a **window function introduces new 
orderings of data**. For example, DataFusion now recognizes that a 
window-aggregate like `MAX` on a column can have an ordering even if the column 
itself doesn't have an ordering (for certain window frames). PR 
[#14271](https://github.com/apache/datafusion/pull/14271) introduced a 
“set-monotonicity” property for window functions, and a follow-up PR 
[#14813](https://github.com/apache/datafusion/pull/14813) refined the handling 
of sort order in window frames. Huge thanks to 
[@berkaysynnada](https://github.com/berkaysynnada) and 
[@mertak-synnada](https://github.com/mertak-synnada) for this featur
 e.
   ```



##########
content/blog/2025-03-24-datafusion-46.0.0.md:
##########
@@ -0,0 +1,92 @@
+---
+layout: post
+title: Apache DataFusion 46.0.0 Released
+date: 2025-03-24
+author: oznur-synnada and berkaysynnada on behalf of PMC
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+We’re excited to announce the release of **Apache DataFusion 46.0.0**! This 
new version represents a significant milestone for the project, packing in a 
wide range of improvements and fixes. You can find the complete details in the 
full 
[changelog](https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md).
 We’ll highlight the most important changes below and guide you through 
upgrading.
+
+## Breaking Changes
+
+DataFusion 46.0.0 brings a few **breaking changes** that may require 
adjustments to your code:
+
+- [Unified DataSourceExec Execution 
Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 
introduces a major refactor of scan operators. The separate 
file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, 
`JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single** 
`DataSourceExec` plan. Format-specific logic is now encapsulated in new 
`DataSource` and `FileSource` traits. This change simplifies the execution 
model, but if you have code that directly references the old plan nodes, you’ll 
need to update it to use `DataSourceExec` (see the [Upgrade 
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for 
examples of the new API).
+- [**Error Handling 
Improvements](https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2) 
(`DataFusionError::Collection`):** We have begun overhauling DataFusion’s error 
handling. In this release, a new error variant `DataFusionError::Collection` 
(and related mechanisms) has been introduced to aggregate multiple errors into 
one. This is part of a broader effort to provide richer error context and 
reduce internal panics. As a result, some error types or messages have changed. 
Downstream code that matches on specific `DataFusionError` variants might need 
adjustment.
+
+## Highlighted New Features
+
+### Improved Diagnostics
+
+DataFusion 46.0.0 introduces a new [**SQL Diagnostics 
framework**](https://github.com/apache/datafusion/issues/14429) to make error 
messages more understandable. This comes in the form of new `Diagnostic` and 
`DiagnosticEntry` types, which allow the system to attach rich context (like 
source query text spans) to error messages. In practical terms, certain planner 
errors will now point to the exact location in your SQL query that caused the 
issue. 
+
+For example, if you reference an unknown table or miss a column in `GROUP BY` 
the error message will include the query snippet causing the error. These 
diagnostics are meant for end-users of applications built on DataFusion, 
providing clearer messages instead of generic errors. Currently, diagnostics 
cover unresolved table/column references, missing `GROUP BY`columns, ambiguous 
references, wrong number of UNION columns, type mismatches, and a few others. 
Future releases will extend this to more error types. This feature should 
greatly ease debugging of complex SQL by pinpointing errors directly in the 
query text. Thanks to [@eliaperantoni](https://github.com/eliaperantoni).
+
+### Unified `DataSourceExec` for Table Providers
+
+As mentioned, DataFusion now uses a unified `DataSourceExec` for reading 
tables, which is both a breaking change and a feature. **Why is this 
exciting?** Because it simplifies how custom table providers are integrated and 
optimized. With one execution plan to handle all data sources, the optimizer 
can treat file scans uniformly and push down filters/limits more consistently. 
The new `DataSourceExec` is paired with a `DataSource` trait that encapsulates 
format-specific behaviors (Parquet, CSV, JSON, Avro, etc.) in a pluggable way.
+
+All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have been 
migrated to this framework. This unification makes the codebase cleaner and 
sets the stage for future enhancements (like consistent metadata handling and 
limit pushdown across all formats). Check out PR 
[#14224](https://github.com/apache/datafusion/pull/14224) for design details, 
thanks to [@mertak-synnada](https://github.com/mertak-synnada) and 
[@ozankabak](https://github.com/ozankabak).

Review Comment:
   ```suggestion
   All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have been 
migrated to this framework. This unification makes the codebase cleaner and 
sets the stage for future enhancements (like consistent metadata handling and 
limit pushdown across all formats). Check out PR 
[#14224](https://github.com/apache/datafusion/pull/14224) for design details. 
We thank [@mertak-synnada](https://github.com/mertak-synnada) and 
[@ozankabak](https://github.com/ozankabak) for their contributions.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to