This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new dc7b8c1968f8 [SPARK-55225][PYTHON][PS] Restore to the original dtype 
for Datetime
dc7b8c1968f8 is described below

commit dc7b8c1968f8bed1496996b5c324cb1a7ed9b38e
Author: Tian Gao <[email protected]>
AuthorDate: Thu Jan 29 09:31:42 2026 +0800

    [SPARK-55225][PYTHON][PS] Restore to the original dtype for Datetime
    
    ### What changes were proposed in this pull request?
    
    Restore dtype of pyspark dataframe series to pandas series based on the 
original pandas dtype.
    
    ### Why are the changes needed?
    
    There could be multiple datetime dtype like `datetime64[ns]` and 
`datetime64[us]` - we should honor the original pandas dataframe.
    
    Specifically, pandas 3 is using smart unit for datetime creation, pinning 
dtype to `datetime64[ns]` won't work in most cases.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, the unit of datetime would be different. To provide best backwards 
compatibility, I keep the pandas 2 behavior for pandas 2.
    
    ### How was this patch tested?
    
    Locally the failed tests passed. CI won't be able to pick up the new 
behavior for pandas 3. pandas 2 (CI) should not be impacted.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #54017 from gaogaotiantian/pandas3-restore-datetime.
    
    Authored-by: Tian Gao <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 python/pyspark/pandas/data_type_ops/datetime_ops.py | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/python/pyspark/pandas/data_type_ops/datetime_ops.py 
b/python/pyspark/pandas/data_type_ops/datetime_ops.py
index 6679cb5783ae..07399ecd873e 100644
--- a/python/pyspark/pandas/data_type_ops/datetime_ops.py
+++ b/python/pyspark/pandas/data_type_ops/datetime_ops.py
@@ -23,6 +23,7 @@ import numpy as np
 import pandas as pd
 from pandas.api.types import CategoricalDtype
 
+from pyspark.loose_version import LooseVersion
 from pyspark.sql import Column, functions as F
 from pyspark.sql.types import (
     BooleanType,
@@ -128,6 +129,13 @@ class DatetimeOps(DataTypeOps):
         """Prepare column when from_pandas."""
         return col
 
+    def restore(self, col: pd.Series) -> pd.Series:
+        """Restore column when to_pandas."""
+        if LooseVersion(pd.__version__) < "3.0.0":
+            return col
+        else:
+            return col.astype(self.dtype)
+
     def astype(self, index_ops: IndexOpsLike, dtype: Union[str, type, Dtype]) 
-> IndexOpsLike:
         dtype, spark_type = pandas_on_spark_type(dtype)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to