This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 4e55f6e93b81 [SPARK-54666][PS] Leave numeric types unchanged on
`to_numeric`
4e55f6e93b81 is described below
commit 4e55f6e93b819377558c73c5181144d99dca3981
Author: Devin Petersohn <[email protected]>
AuthorDate: Wed Feb 25 07:15:32 2026 +0900
[SPARK-54666][PS] Leave numeric types unchanged on `to_numeric`
### What changes were proposed in this pull request?
Fix a bug where numeric Series datatypes would change to float (and
potentially lose precision) on `to_numeric`.
### Why are the changes needed?
It is a bug, it doesn't match pandas behavior or user expectations.
### Does this PR introduce _any_ user-facing change?
Yes, fix the bug
### How was this patch tested?
CI
### Was this patch authored or co-authored using generative AI tooling?
Co-authored-by: Claude Sonnet 4.5
Closes #54403 from devin-petersohn/devin/to_numeric_downcast.
Authored-by: Devin Petersohn <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/pandas/namespace.py | 3 +++
python/pyspark/pandas/tests/test_namespace.py | 5 +++++
2 files changed, 8 insertions(+)
diff --git a/python/pyspark/pandas/namespace.py
b/python/pyspark/pandas/namespace.py
index 2213bfcc6aa1..3ff617c0ee3f 100644
--- a/python/pyspark/pandas/namespace.py
+++ b/python/pyspark/pandas/namespace.py
@@ -64,6 +64,7 @@ from pyspark.sql.types import (
FloatType,
DoubleType,
BooleanType,
+ NumericType,
TimestampType,
TimestampNTZType,
DecimalType,
@@ -3651,6 +3652,8 @@ def to_numeric(arg, errors="raise"):
1.0
"""
if isinstance(arg, Series):
+ if isinstance(arg.spark.data_type, (NumericType, BooleanType)):
+ return arg.copy()
if errors == "coerce":
spark_session = arg._internal.spark_frame.sparkSession
if is_ansi_mode_enabled(spark_session):
diff --git a/python/pyspark/pandas/tests/test_namespace.py
b/python/pyspark/pandas/tests/test_namespace.py
index 8a267f76c536..f68a637723f7 100644
--- a/python/pyspark/pandas/tests/test_namespace.py
+++ b/python/pyspark/pandas/tests/test_namespace.py
@@ -607,6 +607,11 @@ class NamespaceTestsMixin:
lambda: ps.to_numeric(psser, errors="ignore"),
)
+ # SPARK-54666: Series with numeric dtype should be returned as-is.
+ pser = pd.Series([-1554478299, 2])
+ psser = ps.from_pandas(pser)
+ self.assert_eq(pd.to_numeric(pser), ps.to_numeric(psser))
+
def test_json_normalize(self):
# Basic test case with a simple JSON structure
data = [
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]