This is an automated email from the ASF dual-hosted git repository.

HyukjinKwon pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-4.0 by this push:
     new 04fcfb737da6 [SPARK-56584][PYTHON][4.0] Generalize 
`RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` error class and remove dead 
`SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`
04fcfb737da6 is described below

commit 04fcfb737da6bb913914d7377d21be98cff64a43
Author: Yicong Huang <[email protected]>
AuthorDate: Sun May 10 18:02:06 2026 +0900

    [SPARK-56584][PYTHON][4.0] Generalize `RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` 
error class and remove dead `SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF`
    
    ### What changes were proposed in this pull request?
    
    Backport of #55494 to branch-4.0.
    
    The original change:
    1. Renames error class `RESULT_TYPE_MISMATCH_FOR_ARROW_UDF` to 
`RESULT_COLUMN_TYPES_MISMATCH` (parallel to `RESULT_COLUMN_NAMES_MISMATCH` / 
`RESULT_COLUMN_SCHEMA_MISMATCH`).
    2. Rewords the message from `Columns do not match in their data type: 
<mismatch>.` to `Column types of the returned data do not match specified 
schema. Mismatch: <mismatch>.` to align with sibling errors.
    3. Removes the dead error class `SCHEMA_MISMATCH_FOR_ARROW_PYTHON_UDF` 
(already absent on branch-4.0 — no-op for this branch).
    
    Branch-4.1 backport: #55670.
    
    ### Why are the changes needed?
    
    This restores message parity between master server and branch-4.0 client. 
The scheduled cross-version Connect parity build was failing because master 
raises the new `RESULT_COLUMN_TYPES_MISMATCH` text while branch-4.0 client 
tests still assert the old "Columns do not match in their data type" text:
    
    https://github.com/apache/spark/actions/runs/25187494316
    
    Backporting keeps the Arrow result-verify error class name and message 
consistent across maintained branches and unblocks cross-version parity tests.
    
    ### Conflicts resolved
    
    - `python/pyspark/errors/error-conditions.json`: kept `RETRIES_EXCEEDED` 
entry (only present on branch-4.0).
    - `python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py`: kept the 
branch-4.0 `lambda table: table` direct call form (master uses a 
`function_variations(...)` loop helper that is not present on branch-4.0); only 
the assertion message text is updated.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes (same as #55494). User-visible error class name and message for result 
column type mismatches in Arrow UDFs change on branch-4.0.
    
    ### How was this patch tested?
    
    Existing tests; updated 4 asserts in `test_arrow_grouped_map.py` / 
`test_arrow_cogrouped_map.py` match the new message.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #55671 from Yicong-Huang/SPARK-56584-4.0.
    
    Authored-by: Yicong Huang <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/errors/error-conditions.json                | 8 ++++----
 python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py | 6 ++++--
 python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py   | 6 ++++--
 python/pyspark/worker.py                                   | 2 +-
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/python/pyspark/errors/error-conditions.json 
b/python/pyspark/errors/error-conditions.json
index 49c5856934d3..a13b371c1c7f 100644
--- a/python/pyspark/errors/error-conditions.json
+++ b/python/pyspark/errors/error-conditions.json
@@ -890,14 +890,14 @@
       "Number of columns of the returned data doesn't match specified schema. 
Expected: <expected> Actual: <actual>"
     ]
   },
-  "RESULT_ROWS_MISMATCH": {
+  "RESULT_COLUMN_TYPES_MISMATCH": {
     "message": [
-      "The number of output rows (<output_length>) must match the number of 
input rows (<input_length>)."
+      "Column types of the returned data do not match specified schema. 
Mismatch: <mismatch>."
     ]
   },
-  "RESULT_TYPE_MISMATCH_FOR_ARROW_UDF": {
+  "RESULT_ROWS_MISMATCH": {
     "message": [
-      "Columns do not match in their data type: <mismatch>."
+      "The number of output rows (<output_length>) must match the number of 
input rows (<input_length>)."
     ]
   },
   "RETRIES_EXCEEDED": {
diff --git a/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py 
b/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
index bc45e59639d1..88e01c9d2bba 100644
--- a/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
+++ b/python/pyspark/sql/tests/arrow/test_arrow_cogrouped_map.py
@@ -148,7 +148,8 @@ class CogroupedMapInArrowTestsMixin:
                 with self.quiet():
                     with self.assertRaisesRegex(
                         PythonException,
-                        f"Columns do not match in their data type: {expected}",
+                        "Column types of the returned data do not match 
specified schema. "
+                        f"Mismatch: {expected}",
                     ):
                         self.cogrouped.applyInArrow(
                             lambda left, right: left, schema=schema
@@ -172,7 +173,8 @@ class CogroupedMapInArrowTestsMixin:
                     with self.quiet():
                         with self.assertRaisesRegex(
                             PythonException,
-                            f"Columns do not match in their data type: 
{expected}",
+                            "Column types of the returned data do not match 
specified schema. "
+                            f"Mismatch: {expected}",
                         ):
                             self.cogrouped.applyInArrow(
                                 lambda left, right: left, schema=schema
diff --git a/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py 
b/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
index 251c60a27f22..94058977376b 100644
--- a/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
+++ b/python/pyspark/sql/tests/arrow/test_arrow_grouped_map.py
@@ -133,7 +133,8 @@ class GroupedMapInArrowTestsMixin:
                 with self.quiet():
                     with self.assertRaisesRegex(
                         PythonException,
-                        f"Columns do not match in their data type: {expected}",
+                        "Column types of the returned data do not match 
specified schema. "
+                        f"Mismatch: {expected}",
                     ):
                         df.groupby("id").applyInArrow(lambda table: table, 
schema=schema).collect()
 
@@ -157,7 +158,8 @@ class GroupedMapInArrowTestsMixin:
                     with self.quiet():
                         with self.assertRaisesRegex(
                             PythonException,
-                            f"Columns do not match in their data type: 
{expected}",
+                            "Column types of the returned data do not match 
specified schema. "
+                            f"Mismatch: {expected}",
                         ):
                             df.groupby("id").applyInArrow(
                                 lambda table: table, schema=schema
diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py
index 7ff60bd0258b..2d2efed09c4f 100644
--- a/python/pyspark/worker.py
+++ b/python/pyspark/worker.py
@@ -475,7 +475,7 @@ def verify_arrow_result(table, assign_cols_by_name, 
expected_cols_and_types):
 
         if type_mismatch:
             raise PySparkRuntimeError(
-                errorClass="RESULT_TYPE_MISMATCH_FOR_ARROW_UDF",
+                errorClass="RESULT_COLUMN_TYPES_MISMATCH",
                 messageParameters={
                     "mismatch": ", ".join(
                         "column '{}' (expected {}, actual {})".format(name, 
expected, actual)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to