This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 4e55f6e93b81 [SPARK-54666][PS] Leave numeric types unchanged on 
`to_numeric`
4e55f6e93b81 is described below

commit 4e55f6e93b819377558c73c5181144d99dca3981
Author: Devin Petersohn <[email protected]>
AuthorDate: Wed Feb 25 07:15:32 2026 +0900

    [SPARK-54666][PS] Leave numeric types unchanged on `to_numeric`
    
    ### What changes were proposed in this pull request?
    
    Fix a bug where numeric Series datatypes would change to float (and 
potentially lose precision) on `to_numeric`.
    
    ### Why are the changes needed?
    
    It is a bug, it doesn't match pandas behavior or user expectations.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, fix the bug
    
    ### How was this patch tested?
    
    CI
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Co-authored-by: Claude Sonnet 4.5
    
    Closes #54403 from devin-petersohn/devin/to_numeric_downcast.
    
    Authored-by: Devin Petersohn <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/pandas/namespace.py            | 3 +++
 python/pyspark/pandas/tests/test_namespace.py | 5 +++++
 2 files changed, 8 insertions(+)

diff --git a/python/pyspark/pandas/namespace.py 
b/python/pyspark/pandas/namespace.py
index 2213bfcc6aa1..3ff617c0ee3f 100644
--- a/python/pyspark/pandas/namespace.py
+++ b/python/pyspark/pandas/namespace.py
@@ -64,6 +64,7 @@ from pyspark.sql.types import (
     FloatType,
     DoubleType,
     BooleanType,
+    NumericType,
     TimestampType,
     TimestampNTZType,
     DecimalType,
@@ -3651,6 +3652,8 @@ def to_numeric(arg, errors="raise"):
     1.0
     """
     if isinstance(arg, Series):
+        if isinstance(arg.spark.data_type, (NumericType, BooleanType)):
+            return arg.copy()
         if errors == "coerce":
             spark_session = arg._internal.spark_frame.sparkSession
             if is_ansi_mode_enabled(spark_session):
diff --git a/python/pyspark/pandas/tests/test_namespace.py 
b/python/pyspark/pandas/tests/test_namespace.py
index 8a267f76c536..f68a637723f7 100644
--- a/python/pyspark/pandas/tests/test_namespace.py
+++ b/python/pyspark/pandas/tests/test_namespace.py
@@ -607,6 +607,11 @@ class NamespaceTestsMixin:
             lambda: ps.to_numeric(psser, errors="ignore"),
         )
 
+        # SPARK-54666: Series with numeric dtype should be returned as-is.
+        pser = pd.Series([-1554478299, 2])
+        psser = ps.from_pandas(pser)
+        self.assert_eq(pd.to_numeric(pser), ps.to_numeric(psser))
+
     def test_json_normalize(self):
         # Basic test case with a simple JSON structure
         data = [


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to