This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 5c52a00e8f4f [SPARK-53265][PYTHON][DOCS] Add Arrow Python UDF Type 
Coercion Tables in Arrow Python UDF Docs
5c52a00e8f4f is described below

commit 5c52a00e8f4f5f2366337ce4b8e935758401632e
Author: Amanda Liu <amanda....@databricks.com>
AuthorDate: Wed Aug 13 13:45:58 2025 +0800

    [SPARK-53265][PYTHON][DOCS] Add Arrow Python UDF Type Coercion Tables in 
Arrow Python UDF Docs
    
    ### What changes were proposed in this pull request?
    
    Add Arrow Python UDF Type Coercion Tables in Arrow Python UDF Docs, from 
https://github.com/apache/spark/pull/51225
    
    ### Why are the changes needed?
    
    Improve documentation of behavior change
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, updates docs
    
    ### How was this patch tested?
    
    Docs build
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #52004 from asl3/arrowpandasudf-typecoerciontabledoc.
    
    Authored-by: Amanda Liu <amanda....@databricks.com>
    Signed-off-by: Ruifeng Zheng <ruife...@apache.org>
---
 python/docs/source/tutorial/sql/arrow_pandas.rst | 43 ++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/python/docs/source/tutorial/sql/arrow_pandas.rst 
b/python/docs/source/tutorial/sql/arrow_pandas.rst
index eea45835d4c8..30ead20c7314 100644
--- a/python/docs/source/tutorial/sql/arrow_pandas.rst
+++ b/python/docs/source/tutorial/sql/arrow_pandas.rst
@@ -375,6 +375,49 @@ fallback for type mismatches, leading to potential 
ambiguity and data loss. Addi
 and tuples to strings can yield ambiguous results. Arrow Python UDFs, on the 
other hand, leverage Arrow's
 capabilities to standardize type coercion and address these issues effectively.
 
+A note on Arrow Python UDF type coercion: In Spark 4.1, unnecessary conversion 
to pandas instances is removed in the serializer
+when ``spark.sql.execution.pythonUDF.arrow.enabled`` is enabled. As a result, 
the type coercion changes
+when the produced output has a schema different from the specified schema. To 
restore the previous behavior,
+enable ``spark.sql.legacy.execution.pythonUDF.pandas.conversion.enabled``. The 
behavior difference is summarized in the tables below.
+
+Legacy type coercion:
+
+.. csv-table::
+   :header: "SQL Type", "None", "True", "1", "a", "date", "datetime", "1.0", 
"array", "[1]", "(1,)", "bytearray", "Decimal", "dict"
+   :widths: 12, 6, 6, 6, 6, 10, 12, 6, 8, 6, 6, 10, 8, 8
+
+   "boolean", "None", "True", "True", "X", "X", "X", "True", "X", "X", "X", 
"X", "X", "X"
+   "tinyint", "None", "1", "1", "X", "X", "X", "1", "X", "X", "X", "X", "1", 
"X"
+   "smallint", "None", "1", "1", "X", "X", "X", "1", "X", "X", "X", "X", "1", 
"X"
+   "int", "None", "1", "1", "X", "0", "X", "1", "X", "X", "X", "X", "1", "X"
+   "bigint", "None", "1", "1", "X", "X", "0", "1", "X", "X", "X", "X", "1", "X"
+   "string", "None", "'True'", "'1'", "'a'", "'1970-01-01'", "'1970-01-01 
00:00...'", "'1.0'", "\"array('i', [1])\"", "'[1]'", "'(1,)'", 
"\"bytearray(b'ABC')\"", "'1'", "\"{'a': 1}\""
+   "date", "None", "X", "X", "X", "datetime.date(197...)", 
"datetime.date(197...)", "X", "X", "X", "X", "X", "datetime.date(197...)", "X"
+   "timestamp", "None", "X", "datetime.datetime...", "X", "X", 
"datetime.datetime...", "X", "X", "X", "X", "X", "datetime.datetime...", "X"
+   "float", "None", "1.0", "1.0", "X", "X", "X", "1.0", "X", "X", "X", "X", 
"1.0", "X"
+   "double", "None", "1.0", "1.0", "X", "X", "X", "1.0", "X", "X", "X", "X", 
"1.0", "X"
+   "binary", "None", "bytearray(b'\\x00')", "bytearray(b'\\x00')", "X", "X", 
"X", "X", "bytearray(b'\\x01\\...", "bytearray(b'\\x01')", 
"bytearray(b'\\x01')", "bytearray(b'ABC')", "X", "X"
+   "decimal(10,0)", "None", "X", "X", "X", "X", "X", "Decimal('1')", "X", "X", 
"X", "X", "Decimal('1')", "X"
+
+New type coercion:
+
+.. csv-table::
+   :header: "SQL Type", "None", "True", "1", "a", "date", "datetime", "1.0", 
"array", "[1]", "(1,)", "bytearray", "Decimal", "dict"
+   :widths: 12, 6, 6, 6, 6, 10, 12, 6, 8, 6, 6, 10, 8, 8
+
+   "boolean", "None", "True", "True", "X", "X", "X", "True", "X", "X", "X", 
"X", "X", "X"
+   "tinyint", "None", "X", "1", "X", "X", "X", "1", "X", "X", "X", "X", "1", 
"X"
+   "smallint", "None", "X", "1", "X", "X", "X", "1", "X", "X", "X", "X", "1", 
"X"
+   "int", "None", "X", "1", "X", "0", "X", "1", "X", "X", "X", "X", "1", "X"
+   "bigint", "None", "X", "1", "X", "X", "0", "1", "X", "X", "X", "X", "1", "X"
+   "string", "None", "'true'", "'1'", "'a'", "'1970-01-01'", "'1970-01-01 
00:00...'", "'1.0'", "\"array('i', [1])\"", "'[1]'", "'(1,)'", 
"\"bytearray(b'ABC')\"", "'1'", "\"{'a': 1}\""
+   "date", "None", "X", "datetime.date(197...)", "X", "datetime.date(197...)", 
"datetime.date(197...)", "datetime.date(197...)", "X", "X", "X", "X", 
"datetime.date(197...)", "X"
+   "timestamp", "None", "X", "X", "X", "X", "datetime.datetime...", "X", "X", 
"X", "X", "X", "X", "X"
+   "float", "None", "1.0", "1.0", "X", "X", "X", "1.0", "X", "X", "X", "X", 
"1.0", "X"
+   "double", "None", "1.0", "1.0", "X", "X", "X", "1.0", "X", "X", "X", "X", 
"1.0", "X"
+   "binary", "None", "X", "X", "X", "X", "X", "X", "X", "X", "X", 
"bytearray(b'ABC')", "X", "X"
+   "decimal(10,0)", "None", "X", "X", "X", "X", "X", "X", "X", "X", "X", "X", 
"Decimal('1')", "X"
+
 Usage Notes
 -----------
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to