alamb commented on code in PR #84:
URL: https://github.com/apache/datafusion-site/pull/84#discussion_r2210222605
##########
content/blog/2025-07-16-datafusion-48.0.0.md:
##########
@@ -0,0 +1,209 @@
+---
+layout: post
+title: Apache DataFusion 48.0.0 Released
+date: 2025-07-16
+author: PMC
+categories: [ release ]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+
+We’re excited to announce the release of **Apache DataFusion 48.0.0**! As
always, this version packs in a wide range of
+improvements and fixes. You can find the complete details in the full
+[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md).
We’ll highlight the most
+important changes below and guide you through upgrading.
+
+## Breaking Changes
+
+DataFusion 48.0.0 brings a few **breaking changes** that may require
adjustments to your code as described in
+the [Upgrade
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0).
Here are the most notable ones:
+
+
+- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion
48.0.0, the default value of this [configuration setting] is now true, and
DataFusion will collect and store statistics when a table is first created via
`CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs.
+
+[configuration setting]: https://datafusion.apache.org/user-guide/configs.html
+
+- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now
includes optional metadata, which allows
+ for carrying through Arrow field metadata to support extension types and
other uses. This means code such as
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar) => ...
+...
+}
+```
+
+Should be updated to:
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar, _metadata) => ...
+...
+}
+```
+
+- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a
`Box<WindowFunction>` instead of a `WindowFunction`
+ directly. This change was made to reduce the size of `Expr` and improve
performance when planning queries
+ (see details on [#16207](https://github.com/apache/datafusion/pull/16207)).
+
+- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata
handling and
+ prepare for extension types, UDF traits now use [FieldRef] rather than a
`DataType`
+ and nullability. `FieldRef` contains the type and nullability, and
additionally allows access to
+ metadata fields, which can be used for extension types.
+
+[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html
+
+- Physical Expression return `Field`: Similarly to UDFs, in order to prepare
for extension type support the
+ [PhysicalExpr] trait has been changed to return [Field] rather than
`DataType`. To upgrade structs which
+ implement `PhysicalExpr` you need to implement the `return_field` function.
+
+[PhysicalExpr]:
https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html
+[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html
+
+- `FileFormat::supports_filters_pushdown` was replaced with
`FileSource::try_pushdown_filters` to support upcoming work to push down
dynamic filters and physical filter pushdown.
+
+- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`,
`AvroExec`, `CsvExec`, and `JsonExec`
+ were deprecated in DataFusion 46 and are removed in DataFusion 48.
+
+## Performance Improvements
+
+DataFusion 48.0.0 comes with some noteworthy performance enhancements:
+
+- **Fewer unnecessary projections:** DataFusion now removes additional
unnecessary `Projection`s in queries. (PRs
[#15787](https://github.com/apache/datafusion/pull/15787),
[#15761](https://github.com/apache/datafusion/pull/15761),
+ and [#15746](https://github.com/apache/datafusion/pull/15746) by
[xudong963](https://github.com/xudong963)).
+
+- **Accelerated string functions**: The `ascii` function was optimized to
significantly improve its performance
+ (PR [#16087](https://github.com/apache/datafusion/pull/16087) by
[tlm365](https://github.com/tlm365)). The `character_length` function was
optimized resulting in
+ [up to
3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984)
performance improvement (PR
[#15931](https://github.com/apache/datafusion/pull/15931) by
[Dandandan](https://github.com/Dandandan))
Review Comment:
fyi @Dandandan
##########
content/blog/2025-07-16-datafusion-48.0.0.md:
##########
@@ -0,0 +1,209 @@
+---
+layout: post
+title: Apache DataFusion 48.0.0 Released
+date: 2025-07-16
+author: PMC
+categories: [ release ]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+
+We’re excited to announce the release of **Apache DataFusion 48.0.0**! As
always, this version packs in a wide range of
+improvements and fixes. You can find the complete details in the full
+[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md).
We’ll highlight the most
+important changes below and guide you through upgrading.
+
+## Breaking Changes
+
+DataFusion 48.0.0 brings a few **breaking changes** that may require
adjustments to your code as described in
+the [Upgrade
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0).
Here are the most notable ones:
+
+
+- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion
48.0.0, the default value of this [configuration setting] is now true, and
DataFusion will collect and store statistics when a table is first created via
`CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs.
+
+[configuration setting]: https://datafusion.apache.org/user-guide/configs.html
+
+- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now
includes optional metadata, which allows
+ for carrying through Arrow field metadata to support extension types and
other uses. This means code such as
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar) => ...
+...
+}
+```
+
+Should be updated to:
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar, _metadata) => ...
+...
+}
+```
+
+- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a
`Box<WindowFunction>` instead of a `WindowFunction`
+ directly. This change was made to reduce the size of `Expr` and improve
performance when planning queries
+ (see details on [#16207](https://github.com/apache/datafusion/pull/16207)).
+
+- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata
handling and
+ prepare for extension types, UDF traits now use [FieldRef] rather than a
`DataType`
+ and nullability. `FieldRef` contains the type and nullability, and
additionally allows access to
+ metadata fields, which can be used for extension types.
+
+[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html
+
+- Physical Expression return `Field`: Similarly to UDFs, in order to prepare
for extension type support the
+ [PhysicalExpr] trait has been changed to return [Field] rather than
`DataType`. To upgrade structs which
+ implement `PhysicalExpr` you need to implement the `return_field` function.
+
+[PhysicalExpr]:
https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html
+[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html
+
+- `FileFormat::supports_filters_pushdown` was replaced with
`FileSource::try_pushdown_filters` to support upcoming work to push down
dynamic filters and physical filter pushdown.
+
+- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`,
`AvroExec`, `CsvExec`, and `JsonExec`
+ were deprecated in DataFusion 46 and are removed in DataFusion 48.
+
+## Performance Improvements
+
+DataFusion 48.0.0 comes with some noteworthy performance enhancements:
+
+- **Fewer unnecessary projections:** DataFusion now removes additional
unnecessary `Projection`s in queries. (PRs
[#15787](https://github.com/apache/datafusion/pull/15787),
[#15761](https://github.com/apache/datafusion/pull/15761),
+ and [#15746](https://github.com/apache/datafusion/pull/15746) by
[xudong963](https://github.com/xudong963)).
+
+- **Accelerated string functions**: The `ascii` function was optimized to
significantly improve its performance
+ (PR [#16087](https://github.com/apache/datafusion/pull/16087) by
[tlm365](https://github.com/tlm365)). The `character_length` function was
optimized resulting in
Review Comment:
fyi @tlm365
##########
content/blog/2025-07-16-datafusion-48.0.0.md:
##########
@@ -0,0 +1,209 @@
+---
+layout: post
+title: Apache DataFusion 48.0.0 Released
+date: 2025-07-16
+author: PMC
+categories: [ release ]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+
+We’re excited to announce the release of **Apache DataFusion 48.0.0**! As
always, this version packs in a wide range of
+improvements and fixes. You can find the complete details in the full
+[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md).
We’ll highlight the most
+important changes below and guide you through upgrading.
+
+## Breaking Changes
+
+DataFusion 48.0.0 brings a few **breaking changes** that may require
adjustments to your code as described in
+the [Upgrade
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0).
Here are the most notable ones:
+
+
+- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion
48.0.0, the default value of this [configuration setting] is now true, and
DataFusion will collect and store statistics when a table is first created via
`CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs.
+
+[configuration setting]: https://datafusion.apache.org/user-guide/configs.html
+
+- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now
includes optional metadata, which allows
+ for carrying through Arrow field metadata to support extension types and
other uses. This means code such as
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar) => ...
+...
+}
+```
+
+Should be updated to:
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar, _metadata) => ...
+...
+}
+```
+
+- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a
`Box<WindowFunction>` instead of a `WindowFunction`
+ directly. This change was made to reduce the size of `Expr` and improve
performance when planning queries
+ (see details on [#16207](https://github.com/apache/datafusion/pull/16207)).
+
+- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata
handling and
+ prepare for extension types, UDF traits now use [FieldRef] rather than a
`DataType`
+ and nullability. `FieldRef` contains the type and nullability, and
additionally allows access to
+ metadata fields, which can be used for extension types.
+
+[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html
+
+- Physical Expression return `Field`: Similarly to UDFs, in order to prepare
for extension type support the
+ [PhysicalExpr] trait has been changed to return [Field] rather than
`DataType`. To upgrade structs which
+ implement `PhysicalExpr` you need to implement the `return_field` function.
+
+[PhysicalExpr]:
https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html
+[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html
+
+- `FileFormat::supports_filters_pushdown` was replaced with
`FileSource::try_pushdown_filters` to support upcoming work to push down
dynamic filters and physical filter pushdown.
+
+- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`,
`AvroExec`, `CsvExec`, and `JsonExec`
+ were deprecated in DataFusion 46 and are removed in DataFusion 48.
+
+## Performance Improvements
+
+DataFusion 48.0.0 comes with some noteworthy performance enhancements:
+
+- **Fewer unnecessary projections:** DataFusion now removes additional
unnecessary `Projection`s in queries. (PRs
[#15787](https://github.com/apache/datafusion/pull/15787),
[#15761](https://github.com/apache/datafusion/pull/15761),
+ and [#15746](https://github.com/apache/datafusion/pull/15746) by
[xudong963](https://github.com/xudong963)).
+
+- **Accelerated string functions**: The `ascii` function was optimized to
significantly improve its performance
+ (PR [#16087](https://github.com/apache/datafusion/pull/16087) by
[tlm365](https://github.com/tlm365)). The `character_length` function was
optimized resulting in
+ [up to
3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984)
performance improvement (PR
[#15931](https://github.com/apache/datafusion/pull/15931) by
[Dandandan](https://github.com/Dandandan))
+
+- **Constant aggregate window expressions:** For unbounded aggregate window
functions the result is the
+ same for all rows within a partition. DataFusion 48.0.0 avoids unnecessary
computation for such queries, resulting in [improved performance by
5.6x](https://github.com/apache/datafusion/pull/16234#issuecomment-2935960865)
+ (PR [#16234](https://github.com/apache/datafusion/pull/16234) by
[suibianwanwank](https://github.com/suibianwanwank))
+
+## Highlighted New Features
+
+### New `datafusion-spark` crate
+
+The DataFusion community has requested [Apache Spark]-compatible functions for
many years, but the current builtin function library is most similar to
Postgresql, which leads to friction. Unfortunately, there are even functions
with the same name but different signatures and/or return types in the two
systems.
+
+One of the many uses of DataFusion is to enhance (e.g. [Apache DataFusion
Comet](https://github.com/apache/datafusion-comet))
+or replace (e.g. [Sail](https://github.com/lakehq/sail)) [Apache
Spark](https://spark.apache.org/). To
+support the community requests and the use cases mentioned above, we have
introduced a new
+[datafusion-spark] crate for DataFusion with spark-compatible functions so the
+community can collaborate to build this shared resource. There are several
hundred functions to implement, and we are looking for help to [complete
datafusion-spark Spark Compatible Functions].
+
+[datafusion-spark]: https://crates.io/crates/datafusion-spark
+[Apache Spark]: https://spark.apache.org
+
+To register all functions in `datafusion-spark` you can use:
+```Rust
+ // Create a new session context
+ let mut ctx = SessionContext::new();
+ // register all spark functions with the context
+ datafusion_spark::register_all(&mut ctx)?;
+ // run a query. Note the `sha2` function is now available which
+ // has Spark semantics
+ let df = ctx.sql("SELECT sha2('The input String', 256)").await?;
+ ...
+}
+```
+Or, to use an individual function, you can do:
+```Rust
+use datafusion_expr::{col, lit};
+use datafusion_spark::expr_fn::sha2;
+// Create the expression `sha2(my_data, 256)`
+let expr = sha2(col("my_data"), lit(256));
+...
+```
+Thanks to [shehabgamin](https://github.com/shehabgamin) for the initial PR
[#15168](https://github.com/apache/datafusion/pull/15168)
Review Comment:
fyi @shehabgamin
##########
content/blog/2025-07-16-datafusion-48.0.0.md:
##########
@@ -0,0 +1,209 @@
+---
+layout: post
+title: Apache DataFusion 48.0.0 Released
+date: 2025-07-16
+author: PMC
+categories: [ release ]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+
+We’re excited to announce the release of **Apache DataFusion 48.0.0**! As
always, this version packs in a wide range of
+improvements and fixes. You can find the complete details in the full
+[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md).
We’ll highlight the most
+important changes below and guide you through upgrading.
+
+## Breaking Changes
+
+DataFusion 48.0.0 brings a few **breaking changes** that may require
adjustments to your code as described in
+the [Upgrade
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0).
Here are the most notable ones:
+
+
+- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion
48.0.0, the default value of this [configuration setting] is now true, and
DataFusion will collect and store statistics when a table is first created via
`CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs.
+
+[configuration setting]: https://datafusion.apache.org/user-guide/configs.html
+
+- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now
includes optional metadata, which allows
+ for carrying through Arrow field metadata to support extension types and
other uses. This means code such as
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar) => ...
+...
+}
+```
+
+Should be updated to:
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar, _metadata) => ...
+...
+}
+```
+
+- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a
`Box<WindowFunction>` instead of a `WindowFunction`
+ directly. This change was made to reduce the size of `Expr` and improve
performance when planning queries
+ (see details on [#16207](https://github.com/apache/datafusion/pull/16207)).
+
+- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata
handling and
+ prepare for extension types, UDF traits now use [FieldRef] rather than a
`DataType`
+ and nullability. `FieldRef` contains the type and nullability, and
additionally allows access to
+ metadata fields, which can be used for extension types.
+
+[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html
+
+- Physical Expression return `Field`: Similarly to UDFs, in order to prepare
for extension type support the
+ [PhysicalExpr] trait has been changed to return [Field] rather than
`DataType`. To upgrade structs which
+ implement `PhysicalExpr` you need to implement the `return_field` function.
+
+[PhysicalExpr]:
https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html
+[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html
+
+- `FileFormat::supports_filters_pushdown` was replaced with
`FileSource::try_pushdown_filters` to support upcoming work to push down
dynamic filters and physical filter pushdown.
+
+- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`,
`AvroExec`, `CsvExec`, and `JsonExec`
+ were deprecated in DataFusion 46 and are removed in DataFusion 48.
+
+## Performance Improvements
+
+DataFusion 48.0.0 comes with some noteworthy performance enhancements:
+
+- **Fewer unnecessary projections:** DataFusion now removes additional
unnecessary `Projection`s in queries. (PRs
[#15787](https://github.com/apache/datafusion/pull/15787),
[#15761](https://github.com/apache/datafusion/pull/15761),
+ and [#15746](https://github.com/apache/datafusion/pull/15746) by
[xudong963](https://github.com/xudong963)).
+
+- **Accelerated string functions**: The `ascii` function was optimized to
significantly improve its performance
+ (PR [#16087](https://github.com/apache/datafusion/pull/16087) by
[tlm365](https://github.com/tlm365)). The `character_length` function was
optimized resulting in
+ [up to
3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984)
performance improvement (PR
[#15931](https://github.com/apache/datafusion/pull/15931) by
[Dandandan](https://github.com/Dandandan))
+
+- **Constant aggregate window expressions:** For unbounded aggregate window
functions the result is the
+ same for all rows within a partition. DataFusion 48.0.0 avoids unnecessary
computation for such queries, resulting in [improved performance by
5.6x](https://github.com/apache/datafusion/pull/16234#issuecomment-2935960865)
+ (PR [#16234](https://github.com/apache/datafusion/pull/16234) by
[suibianwanwank](https://github.com/suibianwanwank))
+
+## Highlighted New Features
+
+### New `datafusion-spark` crate
+
+The DataFusion community has requested [Apache Spark]-compatible functions for
many years, but the current builtin function library is most similar to
Postgresql, which leads to friction. Unfortunately, there are even functions
with the same name but different signatures and/or return types in the two
systems.
+
+One of the many uses of DataFusion is to enhance (e.g. [Apache DataFusion
Comet](https://github.com/apache/datafusion-comet))
+or replace (e.g. [Sail](https://github.com/lakehq/sail)) [Apache
Spark](https://spark.apache.org/). To
+support the community requests and the use cases mentioned above, we have
introduced a new
+[datafusion-spark] crate for DataFusion with spark-compatible functions so the
+community can collaborate to build this shared resource. There are several
hundred functions to implement, and we are looking for help to [complete
datafusion-spark Spark Compatible Functions].
+
+[datafusion-spark]: https://crates.io/crates/datafusion-spark
+[Apache Spark]: https://spark.apache.org
+
+To register all functions in `datafusion-spark` you can use:
+```Rust
+ // Create a new session context
+ let mut ctx = SessionContext::new();
+ // register all spark functions with the context
+ datafusion_spark::register_all(&mut ctx)?;
+ // run a query. Note the `sha2` function is now available which
+ // has Spark semantics
+ let df = ctx.sql("SELECT sha2('The input String', 256)").await?;
+ ...
+}
+```
+Or, to use an individual function, you can do:
+```Rust
+use datafusion_expr::{col, lit};
+use datafusion_spark::expr_fn::sha2;
+// Create the expression `sha2(my_data, 256)`
+let expr = sha2(col("my_data"), lit(256));
+...
+```
+Thanks to [shehabgamin](https://github.com/shehabgamin) for the initial PR
[#15168](https://github.com/apache/datafusion/pull/15168)
+and many others for their help adding additional functions. Please consider
+helping [complete datafusion-spark Spark Compatible Functions].
+
+[Complete datafusion-spark Spark Compatible Functions]:
https://github.com/apache/datafusion/issues/15914
+
+### `ORDER BY ALL sql` support
+
+Inspired by
[DuckDB](https://duckdb.org/docs/stable/sql/query_syntax/orderby.html#order-by-all-examples),
DataFusion 48.0.0 adds support for `ORDER BY ALL`. This allows for easy
ordering of all columns in a query:
+
+```sql
+> set datafusion.sql_parser.dialect = 'DuckDB';
+0 row(s) fetched.
+> CREATE OR REPLACE TABLE addresses AS
+ SELECT '123 Quack Blvd' AS address, 'DuckTown' AS city, '11111' AS zip
+ UNION ALL
+ SELECT '111 Duck Duck Goose Ln', 'DuckTown', '11111'
+ UNION ALL
+ SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111'
+ UNION ALL
+ SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111-0001';
+0 row(s) fetched.
+> SELECT * FROM addresses ORDER BY ALL;
++------------------------+-----------+------------+
+| address | city | zip |
++------------------------+-----------+------------+
+| 111 Duck Duck Goose Ln | Duck Town | 11111 |
+| 111 Duck Duck Goose Ln | Duck Town | 11111-0001 |
+| 111 Duck Duck Goose Ln | DuckTown | 11111 |
+| 123 Quack Blvd | DuckTown | 11111 |
++------------------------+-----------+------------+
+4 row(s) fetched.
+```
+Thanks to [PokIsemaine](https://github.com/PokIsemaine) for PR
[#15772](https://github.com/apache/datafusion/pull/15772)
Review Comment:
fyi @PokIsemaine
##########
content/blog/2025-07-16-datafusion-48.0.0.md:
##########
@@ -0,0 +1,209 @@
+---
+layout: post
+title: Apache DataFusion 48.0.0 Released
+date: 2025-07-16
+author: PMC
+categories: [ release ]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+
+We’re excited to announce the release of **Apache DataFusion 48.0.0**! As
always, this version packs in a wide range of
+improvements and fixes. You can find the complete details in the full
+[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md).
We’ll highlight the most
+important changes below and guide you through upgrading.
+
+## Breaking Changes
+
+DataFusion 48.0.0 brings a few **breaking changes** that may require
adjustments to your code as described in
+the [Upgrade
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0).
Here are the most notable ones:
+
+
+- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion
48.0.0, the default value of this [configuration setting] is now true, and
DataFusion will collect and store statistics when a table is first created via
`CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs.
+
+[configuration setting]: https://datafusion.apache.org/user-guide/configs.html
+
+- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now
includes optional metadata, which allows
+ for carrying through Arrow field metadata to support extension types and
other uses. This means code such as
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar) => ...
+...
+}
+```
+
+Should be updated to:
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar, _metadata) => ...
+...
+}
+```
+
+- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a
`Box<WindowFunction>` instead of a `WindowFunction`
+ directly. This change was made to reduce the size of `Expr` and improve
performance when planning queries
+ (see details on [#16207](https://github.com/apache/datafusion/pull/16207)).
+
+- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata
handling and
+ prepare for extension types, UDF traits now use [FieldRef] rather than a
`DataType`
+ and nullability. `FieldRef` contains the type and nullability, and
additionally allows access to
+ metadata fields, which can be used for extension types.
+
+[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html
+
+- Physical Expression return `Field`: Similarly to UDFs, in order to prepare
for extension type support the
+ [PhysicalExpr] trait has been changed to return [Field] rather than
`DataType`. To upgrade structs which
+ implement `PhysicalExpr` you need to implement the `return_field` function.
+
+[PhysicalExpr]:
https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html
+[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html
+
+- `FileFormat::supports_filters_pushdown` was replaced with
`FileSource::try_pushdown_filters` to support upcoming work to push down
dynamic filters and physical filter pushdown.
+
+- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`,
`AvroExec`, `CsvExec`, and `JsonExec`
+ were deprecated in DataFusion 46 and are removed in DataFusion 48.
+
+## Performance Improvements
+
+DataFusion 48.0.0 comes with some noteworthy performance enhancements:
+
+- **Fewer unnecessary projections:** DataFusion now removes additional
unnecessary `Projection`s in queries. (PRs
[#15787](https://github.com/apache/datafusion/pull/15787),
[#15761](https://github.com/apache/datafusion/pull/15761),
Review Comment:
fyi @xudong963
##########
content/blog/2025-07-16-datafusion-48.0.0.md:
##########
@@ -0,0 +1,209 @@
+---
+layout: post
+title: Apache DataFusion 48.0.0 Released
+date: 2025-07-16
+author: PMC
+categories: [ release ]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+
+We’re excited to announce the release of **Apache DataFusion 48.0.0**! As
always, this version packs in a wide range of
+improvements and fixes. You can find the complete details in the full
+[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md).
We’ll highlight the most
+important changes below and guide you through upgrading.
+
+## Breaking Changes
+
+DataFusion 48.0.0 brings a few **breaking changes** that may require
adjustments to your code as described in
+the [Upgrade
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0).
Here are the most notable ones:
+
+
+- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion
48.0.0, the default value of this [configuration setting] is now true, and
DataFusion will collect and store statistics when a table is first created via
`CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs.
+
+[configuration setting]: https://datafusion.apache.org/user-guide/configs.html
+
+- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now
includes optional metadata, which allows
+ for carrying through Arrow field metadata to support extension types and
other uses. This means code such as
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar) => ...
+...
+}
+```
+
+Should be updated to:
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar, _metadata) => ...
+...
+}
+```
+
+- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a
`Box<WindowFunction>` instead of a `WindowFunction`
+ directly. This change was made to reduce the size of `Expr` and improve
performance when planning queries
+ (see details on [#16207](https://github.com/apache/datafusion/pull/16207)).
+
+- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata
handling and
+ prepare for extension types, UDF traits now use [FieldRef] rather than a
`DataType`
+ and nullability. `FieldRef` contains the type and nullability, and
additionally allows access to
+ metadata fields, which can be used for extension types.
+
+[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html
+
+- Physical Expression return `Field`: Similarly to UDFs, in order to prepare
for extension type support the
+ [PhysicalExpr] trait has been changed to return [Field] rather than
`DataType`. To upgrade structs which
+ implement `PhysicalExpr` you need to implement the `return_field` function.
+
+[PhysicalExpr]:
https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html
+[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html
+
+- `FileFormat::supports_filters_pushdown` was replaced with
`FileSource::try_pushdown_filters` to support upcoming work to push down
dynamic filters and physical filter pushdown.
+
+- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`,
`AvroExec`, `CsvExec`, and `JsonExec`
+ were deprecated in DataFusion 46 and are removed in DataFusion 48.
+
+## Performance Improvements
+
+DataFusion 48.0.0 comes with some noteworthy performance enhancements:
+
+- **Fewer unnecessary projections:** DataFusion now removes additional
unnecessary `Projection`s in queries. (PRs
[#15787](https://github.com/apache/datafusion/pull/15787),
[#15761](https://github.com/apache/datafusion/pull/15761),
+ and [#15746](https://github.com/apache/datafusion/pull/15746) by
[xudong963](https://github.com/xudong963)).
+
+- **Accelerated string functions**: The `ascii` function was optimized to
significantly improve its performance
+ (PR [#16087](https://github.com/apache/datafusion/pull/16087) by
[tlm365](https://github.com/tlm365)). The `character_length` function was
optimized resulting in
+ [up to
3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984)
performance improvement (PR
[#15931](https://github.com/apache/datafusion/pull/15931) by
[Dandandan](https://github.com/Dandandan))
+
+- **Constant aggregate window expressions:** For unbounded aggregate window
functions the result is the
+ same for all rows within a partition. DataFusion 48.0.0 avoids unnecessary
computation for such queries, resulting in [improved performance by
5.6x](https://github.com/apache/datafusion/pull/16234#issuecomment-2935960865)
+ (PR [#16234](https://github.com/apache/datafusion/pull/16234) by
[suibianwanwank](https://github.com/suibianwanwank))
+
+## Highlighted New Features
+
+### New `datafusion-spark` crate
+
+The DataFusion community has requested [Apache Spark]-compatible functions for
many years, but the current builtin function library is most similar to
Postgresql, which leads to friction. Unfortunately, there are even functions
with the same name but different signatures and/or return types in the two
systems.
+
+One of the many uses of DataFusion is to enhance (e.g. [Apache DataFusion
Comet](https://github.com/apache/datafusion-comet))
+or replace (e.g. [Sail](https://github.com/lakehq/sail)) [Apache
Spark](https://spark.apache.org/). To
+support the community requests and the use cases mentioned above, we have
introduced a new
+[datafusion-spark] crate for DataFusion with spark-compatible functions so the
+community can collaborate to build this shared resource. There are several
hundred functions to implement, and we are looking for help to [complete
datafusion-spark Spark Compatible Functions].
+
+[datafusion-spark]: https://crates.io/crates/datafusion-spark
+[Apache Spark]: https://spark.apache.org
+
+To register all functions in `datafusion-spark` you can use:
+```Rust
+ // Create a new session context
+ let mut ctx = SessionContext::new();
+ // register all spark functions with the context
+ datafusion_spark::register_all(&mut ctx)?;
+ // run a query. Note the `sha2` function is now available which
+ // has Spark semantics
+ let df = ctx.sql("SELECT sha2('The input String', 256)").await?;
+ ...
+}
+```
+Or, to use an individual function, you can do:
+```Rust
+use datafusion_expr::{col, lit};
+use datafusion_spark::expr_fn::sha2;
+// Create the expression `sha2(my_data, 256)`
+let expr = sha2(col("my_data"), lit(256));
+...
+```
+Thanks to [shehabgamin](https://github.com/shehabgamin) for the initial PR
[#15168](https://github.com/apache/datafusion/pull/15168)
+and many others for their help adding additional functions. Please consider
+helping [complete datafusion-spark Spark Compatible Functions].
+
+[Complete datafusion-spark Spark Compatible Functions]:
https://github.com/apache/datafusion/issues/15914
+
+### `ORDER BY ALL sql` support
+
+Inspired by
[DuckDB](https://duckdb.org/docs/stable/sql/query_syntax/orderby.html#order-by-all-examples),
DataFusion 48.0.0 adds support for `ORDER BY ALL`. This allows for easy
ordering of all columns in a query:
+
+```sql
+> set datafusion.sql_parser.dialect = 'DuckDB';
+0 row(s) fetched.
+> CREATE OR REPLACE TABLE addresses AS
+ SELECT '123 Quack Blvd' AS address, 'DuckTown' AS city, '11111' AS zip
+ UNION ALL
+ SELECT '111 Duck Duck Goose Ln', 'DuckTown', '11111'
+ UNION ALL
+ SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111'
+ UNION ALL
+ SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111-0001';
+0 row(s) fetched.
+> SELECT * FROM addresses ORDER BY ALL;
++------------------------+-----------+------------+
+| address | city | zip |
++------------------------+-----------+------------+
+| 111 Duck Duck Goose Ln | Duck Town | 11111 |
+| 111 Duck Duck Goose Ln | Duck Town | 11111-0001 |
+| 111 Duck Duck Goose Ln | DuckTown | 11111 |
+| 123 Quack Blvd | DuckTown | 11111 |
++------------------------+-----------+------------+
+4 row(s) fetched.
+```
+Thanks to [PokIsemaine](https://github.com/PokIsemaine) for PR
[#15772](https://github.com/apache/datafusion/pull/15772)
+
+### FFI Support for `AggregateUDF` and `WindowUDF`
+
+This improvement allows for using user defined aggregate and user defined
window functions across FFI boundaries, which enables shared libraries to pass
functions back and forth. This feature unlocks:
+
+- Modules to provide DataFusion based FFI aggregates that can be reused in
projects such as
[datafusion-python](https://github.com/apache/datafusion-python)
+
+- Using the same aggregate and window functions without recompiling with
different DataFusion versions.
+
+This completes the work to add support for all UDF types to DataFusion's FFI
bindings. Thanks to [timsaucer](https://github.com/timsaucer)
Review Comment:
fyi @timsaucer
##########
content/blog/2025-07-16-datafusion-48.0.0.md:
##########
@@ -0,0 +1,209 @@
+---
+layout: post
+title: Apache DataFusion 48.0.0 Released
+date: 2025-07-16
+author: PMC
+categories: [ release ]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+
+We’re excited to announce the release of **Apache DataFusion 48.0.0**! As
always, this version packs in a wide range of
+improvements and fixes. You can find the complete details in the full
+[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md).
We’ll highlight the most
+important changes below and guide you through upgrading.
+
+## Breaking Changes
+
+DataFusion 48.0.0 brings a few **breaking changes** that may require
adjustments to your code as described in
+the [Upgrade
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0).
Here are the most notable ones:
+
+
+- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion
48.0.0, the default value of this [configuration setting] is now true, and
DataFusion will collect and store statistics when a table is first created via
`CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs.
+
+[configuration setting]: https://datafusion.apache.org/user-guide/configs.html
+
+- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now
includes optional metadata, which allows
+ for carrying through Arrow field metadata to support extension types and
other uses. This means code such as
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar) => ...
+...
+}
+```
+
+Should be updated to:
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar, _metadata) => ...
+...
+}
+```
+
+- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a
`Box<WindowFunction>` instead of a `WindowFunction`
+ directly. This change was made to reduce the size of `Expr` and improve
performance when planning queries
+ (see details on [#16207](https://github.com/apache/datafusion/pull/16207)).
+
+- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata
handling and
+ prepare for extension types, UDF traits now use [FieldRef] rather than a
`DataType`
+ and nullability. `FieldRef` contains the type and nullability, and
additionally allows access to
+ metadata fields, which can be used for extension types.
+
+[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html
+
+- Physical Expression return `Field`: Similarly to UDFs, in order to prepare
for extension type support the
+ [PhysicalExpr] trait has been changed to return [Field] rather than
`DataType`. To upgrade structs which
+ implement `PhysicalExpr` you need to implement the `return_field` function.
+
+[PhysicalExpr]:
https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html
+[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html
+
+- `FileFormat::supports_filters_pushdown` was replaced with
`FileSource::try_pushdown_filters` to support upcoming work to push down
dynamic filters and physical filter pushdown.
+
+- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`,
`AvroExec`, `CsvExec`, and `JsonExec`
+ were deprecated in DataFusion 46 and are removed in DataFusion 48.
+
+## Performance Improvements
+
+DataFusion 48.0.0 comes with some noteworthy performance enhancements:
+
+- **Fewer unnecessary projections:** DataFusion now removes additional
unnecessary `Projection`s in queries. (PRs
[#15787](https://github.com/apache/datafusion/pull/15787),
[#15761](https://github.com/apache/datafusion/pull/15761),
+ and [#15746](https://github.com/apache/datafusion/pull/15746) by
[xudong963](https://github.com/xudong963)).
+
+- **Accelerated string functions**: The `ascii` function was optimized to
significantly improve its performance
+ (PR [#16087](https://github.com/apache/datafusion/pull/16087) by
[tlm365](https://github.com/tlm365)). The `character_length` function was
optimized resulting in
+ [up to
3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984)
performance improvement (PR
[#15931](https://github.com/apache/datafusion/pull/15931) by
[Dandandan](https://github.com/Dandandan))
+
+- **Constant aggregate window expressions:** For unbounded aggregate window
functions the result is the
+ same for all rows within a partition. DataFusion 48.0.0 avoids unnecessary
computation for such queries, resulting in [improved performance by
5.6x](https://github.com/apache/datafusion/pull/16234#issuecomment-2935960865)
+ (PR [#16234](https://github.com/apache/datafusion/pull/16234) by
[suibianwanwank](https://github.com/suibianwanwank))
Review Comment:
FYI @suibianwanwank
##########
content/blog/2025-07-16-datafusion-48.0.0.md:
##########
@@ -0,0 +1,209 @@
+---
+layout: post
+title: Apache DataFusion 48.0.0 Released
+date: 2025-07-16
+author: PMC
+categories: [ release ]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+
+We’re excited to announce the release of **Apache DataFusion 48.0.0**! As
always, this version packs in a wide range of
+improvements and fixes. You can find the complete details in the full
+[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md).
We’ll highlight the most
+important changes below and guide you through upgrading.
+
+## Breaking Changes
+
+DataFusion 48.0.0 brings a few **breaking changes** that may require
adjustments to your code as described in
+the [Upgrade
Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0).
Here are the most notable ones:
+
+
+- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion
48.0.0, the default value of this [configuration setting] is now true, and
DataFusion will collect and store statistics when a table is first created via
`CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs.
+
+[configuration setting]: https://datafusion.apache.org/user-guide/configs.html
+
+- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now
includes optional metadata, which allows
+ for carrying through Arrow field metadata to support extension types and
other uses. This means code such as
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar) => ...
+...
+}
+```
+
+Should be updated to:
+
+```rust
+match expr {
+...
+ Expr::Literal(scalar, _metadata) => ...
+...
+}
+```
+
+- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a
`Box<WindowFunction>` instead of a `WindowFunction`
+ directly. This change was made to reduce the size of `Expr` and improve
performance when planning queries
+ (see details on [#16207](https://github.com/apache/datafusion/pull/16207)).
+
+- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata
handling and
+ prepare for extension types, UDF traits now use [FieldRef] rather than a
`DataType`
+ and nullability. `FieldRef` contains the type and nullability, and
additionally allows access to
+ metadata fields, which can be used for extension types.
+
+[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html
+
+- Physical Expression return `Field`: Similarly to UDFs, in order to prepare
for extension type support the
+ [PhysicalExpr] trait has been changed to return [Field] rather than
`DataType`. To upgrade structs which
+ implement `PhysicalExpr` you need to implement the `return_field` function.
+
+[PhysicalExpr]:
https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html
+[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html
+
+- `FileFormat::supports_filters_pushdown` was replaced with
`FileSource::try_pushdown_filters` to support upcoming work to push down
dynamic filters and physical filter pushdown.
+
+- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`,
`AvroExec`, `CsvExec`, and `JsonExec`
+ were deprecated in DataFusion 46 and are removed in DataFusion 48.
+
+## Performance Improvements
+
+DataFusion 48.0.0 comes with some noteworthy performance enhancements:
+
+- **Fewer unnecessary projections:** DataFusion now removes additional
unnecessary `Projection`s in queries. (PRs
[#15787](https://github.com/apache/datafusion/pull/15787),
[#15761](https://github.com/apache/datafusion/pull/15761),
+ and [#15746](https://github.com/apache/datafusion/pull/15746) by
[xudong963](https://github.com/xudong963)).
+
+- **Accelerated string functions**: The `ascii` function was optimized to
significantly improve its performance
+ (PR [#16087](https://github.com/apache/datafusion/pull/16087) by
[tlm365](https://github.com/tlm365)). The `character_length` function was
optimized resulting in
+ [up to
3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984)
performance improvement (PR
[#15931](https://github.com/apache/datafusion/pull/15931) by
[Dandandan](https://github.com/Dandandan))
+
+- **Constant aggregate window expressions:** For unbounded aggregate window
functions the result is the
+ same for all rows within a partition. DataFusion 48.0.0 avoids unnecessary
computation for such queries, resulting in [improved performance by
5.6x](https://github.com/apache/datafusion/pull/16234#issuecomment-2935960865)
+ (PR [#16234](https://github.com/apache/datafusion/pull/16234) by
[suibianwanwank](https://github.com/suibianwanwank))
+
+## Highlighted New Features
+
+### New `datafusion-spark` crate
+
+The DataFusion community has requested [Apache Spark]-compatible functions for
many years, but the current builtin function library is most similar to
Postgresql, which leads to friction. Unfortunately, there are even functions
with the same name but different signatures and/or return types in the two
systems.
+
+One of the many uses of DataFusion is to enhance (e.g. [Apache DataFusion
Comet](https://github.com/apache/datafusion-comet))
+or replace (e.g. [Sail](https://github.com/lakehq/sail)) [Apache
Spark](https://spark.apache.org/). To
+support the community requests and the use cases mentioned above, we have
introduced a new
+[datafusion-spark] crate for DataFusion with spark-compatible functions so the
+community can collaborate to build this shared resource. There are several
hundred functions to implement, and we are looking for help to [complete
datafusion-spark Spark Compatible Functions].
+
+[datafusion-spark]: https://crates.io/crates/datafusion-spark
+[Apache Spark]: https://spark.apache.org
+
+To register all functions in `datafusion-spark` you can use:
+```Rust
+ // Create a new session context
+ let mut ctx = SessionContext::new();
+ // register all spark functions with the context
+ datafusion_spark::register_all(&mut ctx)?;
+ // run a query. Note the `sha2` function is now available which
+ // has Spark semantics
+ let df = ctx.sql("SELECT sha2('The input String', 256)").await?;
+ ...
+}
+```
+Or, to use an individual function, you can do:
+```Rust
+use datafusion_expr::{col, lit};
+use datafusion_spark::expr_fn::sha2;
+// Create the expression `sha2(my_data, 256)`
+let expr = sha2(col("my_data"), lit(256));
+...
+```
+Thanks to [shehabgamin](https://github.com/shehabgamin) for the initial PR
[#15168](https://github.com/apache/datafusion/pull/15168)
+and many others for their help adding additional functions. Please consider
+helping [complete datafusion-spark Spark Compatible Functions].
+
+[Complete datafusion-spark Spark Compatible Functions]:
https://github.com/apache/datafusion/issues/15914
+
+### `ORDER BY ALL sql` support
+
+Inspired by
[DuckDB](https://duckdb.org/docs/stable/sql/query_syntax/orderby.html#order-by-all-examples),
DataFusion 48.0.0 adds support for `ORDER BY ALL`. This allows for easy
ordering of all columns in a query:
+
+```sql
+> set datafusion.sql_parser.dialect = 'DuckDB';
+0 row(s) fetched.
+> CREATE OR REPLACE TABLE addresses AS
+ SELECT '123 Quack Blvd' AS address, 'DuckTown' AS city, '11111' AS zip
+ UNION ALL
+ SELECT '111 Duck Duck Goose Ln', 'DuckTown', '11111'
+ UNION ALL
+ SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111'
+ UNION ALL
+ SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111-0001';
+0 row(s) fetched.
+> SELECT * FROM addresses ORDER BY ALL;
++------------------------+-----------+------------+
+| address | city | zip |
++------------------------+-----------+------------+
+| 111 Duck Duck Goose Ln | Duck Town | 11111 |
+| 111 Duck Duck Goose Ln | Duck Town | 11111-0001 |
+| 111 Duck Duck Goose Ln | DuckTown | 11111 |
+| 123 Quack Blvd | DuckTown | 11111 |
++------------------------+-----------+------------+
+4 row(s) fetched.
+```
+Thanks to [PokIsemaine](https://github.com/PokIsemaine) for PR
[#15772](https://github.com/apache/datafusion/pull/15772)
+
+### FFI Support for `AggregateUDF` and `WindowUDF`
+
+This improvement allows for using user defined aggregate and user defined
window functions across FFI boundaries, which enables shared libraries to pass
functions back and forth. This feature unlocks:
+
+- Modules to provide DataFusion based FFI aggregates that can be reused in
projects such as
[datafusion-python](https://github.com/apache/datafusion-python)
+
+- Using the same aggregate and window functions without recompiling with
different DataFusion versions.
+
+This completes the work to add support for all UDF types to DataFusion's FFI
bindings. Thanks to [timsaucer](https://github.com/timsaucer)
+for PRs [#16261](https://github.com/apache/datafusion/pull/16261) and
[#14775](https://github.com/apache/datafusion/pull/14775).
+
+### Reduced size of `Expr` struct
+
+The [Expr] struct is widely used across the DataFusion and downstream
codebases. By `Box`ing `WindowFunction`s, we reduced the size of `Expr` by
almost 50%, from `272` to `144` bytes. This reduction improved planning times
between 10% and 20% and reduced memory usage. Thanks to
[hendrikmakait](https://github.com/hendrikmakait) for
Review Comment:
fyi @hendrikmakait
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]