This is an automated email from the ASF dual-hosted git repository. alamb pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push: new 5a861424c8 Add change to VARCHAR in the upgrade guide (#16216) 5a861424c8 is described below commit 5a861424c89414b40303a646e949b464b8ca5648 Author: Andrew Lamb <and...@nerdnetworks.org> AuthorDate: Sun Jun 1 12:49:00 2025 -0400 Add change to VARCHAR in the upgrade guide (#16216) * Add change to VARCHAR in the upgrade guide * Update docs/source/library-user-guide/upgrading.md Co-authored-by: Oleks V <comph...@users.noreply.github.com> --------- Co-authored-by: Oleks V <comph...@users.noreply.github.com> --- docs/source/library-user-guide/upgrading.md | 49 +++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/docs/source/library-user-guide/upgrading.md b/docs/source/library-user-guide/upgrading.md index ed8fdadab2..3922e0d45d 100644 --- a/docs/source/library-user-guide/upgrading.md +++ b/docs/source/library-user-guide/upgrading.md @@ -21,6 +21,55 @@ ## DataFusion `48.0.0` +### The `VARCHAR` SQL type is now represented as `Utf8View` in Arrow. + +The mapping of the SQL `VARCHAR` type has been changed from `Utf8` to `Utf8View` +which improves performance for many string operations. You can read more about +`Utf8View` in the [DataFusion blog post on German-style strings] + +[datafusion blog post on german-style strings]: https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/ + +This means that when you create a table with a `VARCHAR` column, it will now use +`Utf8View` as the underlying data type. For example: + +```sql +> CREATE TABLE my_table (my_column VARCHAR); +0 row(s) fetched. +Elapsed 0.001 seconds. + +> DESCRIBE my_table; ++-------------+-----------+-------------+ +| column_name | data_type | is_nullable | ++-------------+-----------+-------------+ +| my_column | Utf8View | YES | ++-------------+-----------+-------------+ +1 row(s) fetched. +Elapsed 0.000 seconds. +``` + +You can restore the old behavior of using `Utf8` by changing the +`datafusion.sql_parser.map_varchar_to_utf8view` configuration setting. For +example + +```sql +> set datafusion.sql_parser.map_varchar_to_utf8view = false; +0 row(s) fetched. +Elapsed 0.001 seconds. + +> CREATE TABLE my_table (my_column VARCHAR); +0 row(s) fetched. +Elapsed 0.014 seconds. + +> DESCRIBE my_table; ++-------------+-----------+-------------+ +| column_name | data_type | is_nullable | ++-------------+-----------+-------------+ +| my_column | Utf8 | YES | ++-------------+-----------+-------------+ +1 row(s) fetched. +Elapsed 0.004 seconds. +``` + ### `ListingOptions` default for `collect_stat` changed from `true` to `false` This makes it agree with the default for `SessionConfig`. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org