(spark) branch master updated: [SPARK-55224][PYTHON][FOLLOWUP] Remove redundant `use_legacy_pandas_udf_conversion` condition in serializer setup

ruifengz Sun, 08 Feb 2026 16:20:28 -0800

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 26384d7de53f [SPARK-55224][PYTHON][FOLLOWUP] Remove redundant 
`use_legacy_pandas_udf_conversion` condition in serializer setup
26384d7de53f is described below

commit 26384d7de53ff2efc68b9824132eb298fd9a5ff1
Author: Yicong-Huang <[email protected]>
AuthorDate: Mon Feb 9 08:19:59 2026 +0800

    [SPARK-55224][PYTHON][FOLLOWUP] Remove redundant 
`use_legacy_pandas_udf_conversion` condition in serializer setup
    
    ### What changes were proposed in this pull request?
    
    Remove the redundant `or runner_conf.use_legacy_pandas_udf_conversion` 
condition from `struct_in_pandas` and `ndarray_as_list` in `read_udfs`.
    
    ### Why are the changes needed?
    
    When `use_legacy_pandas_udf_conversion=True`, `SQL_ARROW_BATCHED_UDF` falls 
through to the `else` branch where `eval_type == SQL_ARROW_BATCHED_UDF` is 
already `True` — the `or` is redundant. It also incorrectly affects other eval 
types (e.g., `SQL_SCALAR_PANDAS_UDF` would get `struct_in_pandas="row"` instead 
of `"dict"`).
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Existing UDF tests.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #54212 from 
Yicong-Huang/SPARK-55224/fix/remove-redundant-legacy-condition.
    
    Authored-by: Yicong-Huang <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 python/pyspark/worker.py | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py
index 7299c6211cf1..59d4434ab815 100644
--- a/python/pyspark/worker.py
+++ b/python/pyspark/worker.py
@@ -2776,19 +2776,10 @@ def read_udfs(pickleSer, infile, eval_type, 
runner_conf, eval_conf):
                 or eval_type == PythonEvalType.SQL_MAP_PANDAS_ITER_UDF
             )
             # Arrow-optimized Python UDF takes a struct type argument as a Row
-            # When legacy pandas conversion is enabled, use "row" and convert 
ndarray to list
             struct_in_pandas = (
-                "row"
-                if (
-                    eval_type == PythonEvalType.SQL_ARROW_BATCHED_UDF
-                    or runner_conf.use_legacy_pandas_udf_conversion
-                )
-                else "dict"
-            )
-            ndarray_as_list = (
-                eval_type == PythonEvalType.SQL_ARROW_BATCHED_UDF
-                or runner_conf.use_legacy_pandas_udf_conversion
+                "row" if eval_type == PythonEvalType.SQL_ARROW_BATCHED_UDF 
else "dict"
             )
+            ndarray_as_list = eval_type == PythonEvalType.SQL_ARROW_BATCHED_UDF
             # Arrow-optimized Python UDF takes input types
             input_type = (
                 _parse_datatype_json_string(utf8_deserializer.loads(infile))


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-55224][PYTHON][FOLLOWUP] Remove redundant `use_legacy_pandas_udf_conversion` condition in serializer setup

Reply via email to