This is an automated email from the ASF dual-hosted git repository.
ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 26384d7de53f [SPARK-55224][PYTHON][FOLLOWUP] Remove redundant
`use_legacy_pandas_udf_conversion` condition in serializer setup
26384d7de53f is described below
commit 26384d7de53ff2efc68b9824132eb298fd9a5ff1
Author: Yicong-Huang <[email protected]>
AuthorDate: Mon Feb 9 08:19:59 2026 +0800
[SPARK-55224][PYTHON][FOLLOWUP] Remove redundant
`use_legacy_pandas_udf_conversion` condition in serializer setup
### What changes were proposed in this pull request?
Remove the redundant `or runner_conf.use_legacy_pandas_udf_conversion`
condition from `struct_in_pandas` and `ndarray_as_list` in `read_udfs`.
### Why are the changes needed?
When `use_legacy_pandas_udf_conversion=True`, `SQL_ARROW_BATCHED_UDF` falls
through to the `else` branch where `eval_type == SQL_ARROW_BATCHED_UDF` is
already `True` — the `or` is redundant. It also incorrectly affects other eval
types (e.g., `SQL_SCALAR_PANDAS_UDF` would get `struct_in_pandas="row"` instead
of `"dict"`).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing UDF tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #54212 from
Yicong-Huang/SPARK-55224/fix/remove-redundant-legacy-condition.
Authored-by: Yicong-Huang <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
python/pyspark/worker.py | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py
index 7299c6211cf1..59d4434ab815 100644
--- a/python/pyspark/worker.py
+++ b/python/pyspark/worker.py
@@ -2776,19 +2776,10 @@ def read_udfs(pickleSer, infile, eval_type,
runner_conf, eval_conf):
or eval_type == PythonEvalType.SQL_MAP_PANDAS_ITER_UDF
)
# Arrow-optimized Python UDF takes a struct type argument as a Row
- # When legacy pandas conversion is enabled, use "row" and convert
ndarray to list
struct_in_pandas = (
- "row"
- if (
- eval_type == PythonEvalType.SQL_ARROW_BATCHED_UDF
- or runner_conf.use_legacy_pandas_udf_conversion
- )
- else "dict"
- )
- ndarray_as_list = (
- eval_type == PythonEvalType.SQL_ARROW_BATCHED_UDF
- or runner_conf.use_legacy_pandas_udf_conversion
+ "row" if eval_type == PythonEvalType.SQL_ARROW_BATCHED_UDF
else "dict"
)
+ ndarray_as_list = eval_type == PythonEvalType.SQL_ARROW_BATCHED_UDF
# Arrow-optimized Python UDF takes input types
input_type = (
_parse_datatype_json_string(utf8_deserializer.loads(infile))
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]