(spark) branch master updated: [SPARK-54987][PYTHON][DOCS][FOLLOW-UP] Update the docstring of `to_arrow_type`

ruifengz Sun, 18 Jan 2026 17:14:17 -0800

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 236e71d67540 [SPARK-54987][PYTHON][DOCS][FOLLOW-UP] Update the 
docstring of `to_arrow_type`
236e71d67540 is described below

commit 236e71d67540139c1d6fca8d62a643cd9adaabab
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Mon Jan 19 09:11:51 2026 +0800

    [SPARK-54987][PYTHON][DOCS][FOLLOW-UP] Update the docstring of 
`to_arrow_type`
    
    ### What changes were proposed in this pull request?
    Update the docstring of `to_arrow_type`
    
    ### Why are the changes needed?
    1, fix the comment about default value;
    2, explain why `unit` is ignored in this function;
    
    ### Does this PR introduce _any_ user-facing change?
    No, doc-only
    
    ### How was this patch tested?
    ci
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #53797 from zhengruifeng/doc_from_arrow_type.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 python/pyspark/sql/pandas/types.py | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/sql/pandas/types.py 
b/python/pyspark/sql/pandas/types.py
index 7be8e5942494..e7a208f200ac 100644
--- a/python/pyspark/sql/pandas/types.py
+++ b/python/pyspark/sql/pandas/types.py
@@ -330,10 +330,19 @@ def from_arrow_type(
     at : :class:`pyarrow.DataType`
         pyarrow data type
     prefer_timestamp_ntz: bool, default True
-        When the input timezone is None, whether to convert it to 
timezone-aware TimestampType.
-        By default, the to_arrow_type convert timezone-naive TimestampNTZType 
to pa.timestamp
-        without timezone. So it only make sense to set it in case like 
creating dataframe
+        When the input timezone is None, whether to convert it to 
timezone-naive TimestampNTZType.
+        The to_arrow_type always convert timezone-naive TimestampNTZType to 
pa.timestamp
+        without timezone. The default value is True, so that
+        from_arrow_type(to_arrow_type(TimestampNTZType)) returns 
TimestampNTZType.
+        It only makes sense to explicitly set it in case like creating 
dataframe
         from arrow/pandas data according to config `spark.sql.timestampType`.
+
+    Notes
+    -----
+    Different from JVM side ArrowUtils.fromArrowType, the unit ('ns'/'us'/etc) 
in types
+    pa.timestamp/pa.duration/pa.time64 is ignored in this function.
+    That is because this function is also used in data type inference in 
creating dataframe
+    from arrow/pandas data, in which the input data may use a different unit.
     """
 
     import pyarrow.types as types


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-54987][PYTHON][DOCS][FOLLOW-UP] Update the docstring of `to_arrow_type`

Reply via email to