This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new f0e89994fd41 [SPARK-53265][PYTHON][DOCS] Add Arrow Python UDF Type Coercion Tables in Arrow Python UDF Docs f0e89994fd41 is described below commit f0e89994fd41b215b040d278b384fbd52eae6d96 Author: Amanda Liu <amanda....@databricks.com> AuthorDate: Fri Aug 22 09:10:57 2025 +0800 [SPARK-53265][PYTHON][DOCS] Add Arrow Python UDF Type Coercion Tables in Arrow Python UDF Docs ### What changes were proposed in this pull request? Link Arrow Python UDF Type Coercion Tables in Arrow Python UDF Docs, from https://github.com/apache/spark/pull/51225. This PR replaces https://github.com/apache/spark/pull/52004 due to docs build failure. ### Why are the changes needed? Improve documentation of behavior change ### Does this PR introduce _any_ user-facing change? Yes, updates docs ### How was this patch tested? Docs build ### Was this patch authored or co-authored using generative AI tooling? No Closes #52025 from asl3/arrowpandasudf-typecoerciontabledoc2. Authored-by: Amanda Liu <amanda....@databricks.com> Signed-off-by: Ruifeng Zheng <ruife...@apache.org> --- python/docs/source/tutorial/sql/arrow_pandas.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/python/docs/source/tutorial/sql/arrow_pandas.rst b/python/docs/source/tutorial/sql/arrow_pandas.rst index eea45835d4c8..3bef50874d7f 100644 --- a/python/docs/source/tutorial/sql/arrow_pandas.rst +++ b/python/docs/source/tutorial/sql/arrow_pandas.rst @@ -375,6 +375,12 @@ fallback for type mismatches, leading to potential ambiguity and data loss. Addi and tuples to strings can yield ambiguous results. Arrow Python UDFs, on the other hand, leverage Arrow's capabilities to standardize type coercion and address these issues effectively. +A note on Arrow Python UDF type coercion: In Spark 4.1, unnecessary conversion to pandas instances is removed in the serializer +when ``spark.sql.execution.pythonUDF.arrow.enabled`` is enabled. As a result, the type coercion changes +when the produced output has a schema different from the specified schema. To restore the previous behavior, +enable ``spark.sql.legacy.execution.pythonUDF.pandas.conversion.enabled``. +The behavior difference is summarized in the tables `here <https://github.com/apache/spark/pull/51225>`__. + Usage Notes ----------- --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org