(spark) branch master updated: [SPARK-49353][SQL] Update docs related to `UTF-32` encoding/decoding

maxgekk Thu, 22 Aug 2024 09:21:05 -0700

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new b57d863afda1 [SPARK-49353][SQL] Update docs related to `UTF-32` 
encoding/decoding
b57d863afda1 is described below

commit b57d863afda1dd78c3bdea8fcb02a3bd55cb137f
Author: panbingkun <[email protected]>
AuthorDate: Thu Aug 22 18:20:48 2024 +0200

    [SPARK-49353][SQL] Update docs related to `UTF-32` encoding/decoding
    
    ### What changes were proposed in this pull request?
    The pr aims to update the related docs after `encoding` and `decoding` 
support `UTF-32`,  includes:
    - the `doc` of the sql config `spark.sql.legacy.javaCharsets`
    - 
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
    - sql/core/src/main/scala/org/apache/spark/sql/functions.scala
    - python/pyspark/sql/functions/builtin.py
    
    ### Why are the changes needed?
    After the pr https://github.com/apache/spark/pull/46469, `UTF-32` for 
string encoding and decoding is already supported, but some related documents 
have not been updated synchronously.
    Let's update it to avoid misunderstandings for end-users and developers.
    
    
https://github.com/apache/spark/blob/e93c5fbe81d21f8bf2ce52867013d06a63c7956e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharsetProvider.scala#L26
    
    ### Does this PR introduce _any_ user-facing change?
    Yes, fix doc.
    
    ### How was this patch tested?
    Nope, only fixed some docs.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No.
    
    Closes #47844 from panbingkun/SPARK-49353.
    
    Authored-by: panbingkun <[email protected]>
    Signed-off-by: Max Gekk <[email protected]>
---
 .../jvm/src/main/scala/org/apache/spark/sql/functions.scala       | 8 ++++----
 python/pyspark/sql/functions/builtin.py                           | 4 ++--
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala    | 3 ++-
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala      | 4 ++--
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index c0bf9c9d013c..3b6675362d55 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3840,8 +3840,8 @@ object functions {
 
   /**
    * Computes the first argument into a string from a binary using the 
provided character set (one
-   * of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). 
If either argument
-   * is null, the result will also be null.
+   * of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 
'UTF-32'). If either
+   * argument is null, the result will also be null.
    *
    * @group string_funcs
    * @since 3.4.0
@@ -3851,8 +3851,8 @@ object functions {
 
   /**
    * Computes the first argument into a binary from a string using the 
provided character set (one
-   * of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). 
If either argument
-   * is null, the result will also be null.
+   * of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', 
'UTF-32'). If either
+   * argument is null, the result will also be null.
    *
    * @group string_funcs
    * @since 3.4.0
diff --git a/python/pyspark/sql/functions/builtin.py 
b/python/pyspark/sql/functions/builtin.py
index 24b8ae82e99a..387a039758f1 100644
--- a/python/pyspark/sql/functions/builtin.py
+++ b/python/pyspark/sql/functions/builtin.py
@@ -10989,7 +10989,7 @@ def concat_ws(sep: str, *cols: "ColumnOrName") -> 
Column:
 def decode(col: "ColumnOrName", charset: str) -> Column:
     """
     Computes the first argument into a string from a binary using the provided 
character set
-    (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 
'UTF-16').
+    (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 
'UTF-16', 'UTF-32').
 
     .. versionadded:: 1.5.0
 
@@ -11027,7 +11027,7 @@ def decode(col: "ColumnOrName", charset: str) -> Column:
 def encode(col: "ColumnOrName", charset: str) -> Column:
     """
     Computes the first argument into a binary from a string using the provided 
character set
-    (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 
'UTF-16').
+    (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 
'UTF-16', 'UTF-32').
 
     .. versionadded:: 1.5.0
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index e3f3350ed636..a2bc56a73bc4 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -5090,7 +5090,8 @@ object SQLConf {
     .internal()
     .doc("When set to true, the functions like `encode()` can use charsets 
from JDK while " +
       "encoding or decoding string values. If it is false, such functions 
support only one of " +
-      "the charsets: 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 
'UTF-16LE', 'UTF-16'.")
+      "the charsets: 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 
'UTF-16LE', 'UTF-16', " +
+      "'UTF-32'.")
     .version("4.0.0")
     .booleanConf
     .createWithDefault(false)
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index be83444a8fd3..62315123a858 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3752,7 +3752,7 @@ object functions {
 
   /**
    * Computes the first argument into a string from a binary using the 
provided character set
-   * (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 
'UTF-16').
+   * (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 
'UTF-16', 'UTF-32').
    * If either argument is null, the result will also be null.
    *
    * @group string_funcs
@@ -3763,7 +3763,7 @@ object functions {
 
   /**
    * Computes the first argument into a binary from a string using the 
provided character set
-   * (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 
'UTF-16').
+   * (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 
'UTF-16', 'UTF-32').
    * If either argument is null, the result will also be null.
    *
    * @group string_funcs


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-49353][SQL] Update docs related to `UTF-32` encoding/decoding

Reply via email to