kevinjqliu commented on code in PR #64: URL: https://github.com/apache/datafusion-site/pull/64#discussion_r2012878189
########## content/blog/2025-03-24-datafusion-46.0.0.md: ########## @@ -0,0 +1,96 @@ +--- +layout: post +title: Apache DataFusion 46.0.0 Released +date: 2025-03-24 +author: Oznur Hanci and Berkay Sahin on behalf of the PMC +categories: [release] +--- +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +We’re excited to announce the release of **Apache DataFusion 46.0.0**! This new version represents a significant milestone for the project, packing in a wide range of improvements and fixes. You can find the complete details in the full [changelog](https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md). We’ll highlight the most important changes below and guide you through upgrading. + +## Breaking Changes + +DataFusion 46.0.0 brings a few **breaking changes** that may require adjustments to your code as described in the [Upgrade Guide](https://datafusion.apache.org/library-user-guide/upgrading.html). Here are the most notable ones: + +- [Unified `DataSourceExec` Execution Plan](https://github.com/apache/datafusion/pull/14224#)**:** DataFusion 46.0.0 introduces a major refactor of scan operators. The separate file-format-specific execution plan nodes (`ParquetExec`, `CsvExec`, `JsonExec`, `AvroExec`, etc.) have been **deprecated and merged into a single `DataSourceExec` plan**. Format-specific logic is now encapsulated in new `DataSource` and `FileSource` traits. This change simplifies the execution model, but if you have code that directly references the old plan nodes, you’ll need to update it to use `DataSourceExec` (see the [Upgrade Guide](https://datafusion.apache.org/library-user-guide/upgrading.html) for examples of the new API). +- [**Error Handling Improvements](https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2) (`DataFusionError::Collection`):** We began overhauling DataFusion’s approach to error handling. In this release, a new error variant `DataFusionError::Collection` (and related mechanisms) has been introduced to aggregate multiple errors into one. This is part of a broader effort to provide richer error context and reduce internal panics. As a result, some error types or messages have changed. Downstream code that matches on specific `DataFusionError` variants might need adjustment. + +## Highlighted New Features + +### Improved Diagnostics + +DataFusion 46.0.0 introduces a new [**SQL Diagnostics framework**](https://github.com/apache/datafusion/issues/14429) to make error messages more understandable. This comes in the form of new `Diagnostic` and `DiagnosticEntry` types, which allow the system to attach rich context (like source query text spans) to error messages. In practical terms, certain planner errors will now point to the exact location in your SQL query that caused the issue. + +For example, if you reference an unknown table or miss a column in `GROUP BY` the error message will include the query snippet causing the error. These diagnostics are meant for end-users of applications built on DataFusion, providing clearer messages instead of generic errors. Here’s an example: + +<img src="/blog/images/datafusion-46.0.0/diagnostic-example.png" alt="Parquet pruning pipeline in DataFusion" width="80%" class="img-responsive"> Review Comment: is `alt="Parquet pruning pipeline in DataFusion"` a copy/paste error? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org