This is an automated email from the ASF dual-hosted git repository.
ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 236e71d67540 [SPARK-54987][PYTHON][DOCS][FOLLOW-UP] Update the
docstring of `to_arrow_type`
236e71d67540 is described below
commit 236e71d67540139c1d6fca8d62a643cd9adaabab
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Mon Jan 19 09:11:51 2026 +0800
[SPARK-54987][PYTHON][DOCS][FOLLOW-UP] Update the docstring of
`to_arrow_type`
### What changes were proposed in this pull request?
Update the docstring of `to_arrow_type`
### Why are the changes needed?
1, fix the comment about default value;
2, explain why `unit` is ignored in this function;
### Does this PR introduce _any_ user-facing change?
No, doc-only
### How was this patch tested?
ci
### Was this patch authored or co-authored using generative AI tooling?
no
Closes #53797 from zhengruifeng/doc_from_arrow_type.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
python/pyspark/sql/pandas/types.py | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/python/pyspark/sql/pandas/types.py
b/python/pyspark/sql/pandas/types.py
index 7be8e5942494..e7a208f200ac 100644
--- a/python/pyspark/sql/pandas/types.py
+++ b/python/pyspark/sql/pandas/types.py
@@ -330,10 +330,19 @@ def from_arrow_type(
at : :class:`pyarrow.DataType`
pyarrow data type
prefer_timestamp_ntz: bool, default True
- When the input timezone is None, whether to convert it to
timezone-aware TimestampType.
- By default, the to_arrow_type convert timezone-naive TimestampNTZType
to pa.timestamp
- without timezone. So it only make sense to set it in case like
creating dataframe
+ When the input timezone is None, whether to convert it to
timezone-naive TimestampNTZType.
+ The to_arrow_type always convert timezone-naive TimestampNTZType to
pa.timestamp
+ without timezone. The default value is True, so that
+ from_arrow_type(to_arrow_type(TimestampNTZType)) returns
TimestampNTZType.
+ It only makes sense to explicitly set it in case like creating
dataframe
from arrow/pandas data according to config `spark.sql.timestampType`.
+
+ Notes
+ -----
+ Different from JVM side ArrowUtils.fromArrowType, the unit ('ns'/'us'/etc)
in types
+ pa.timestamp/pa.duration/pa.time64 is ignored in this function.
+ That is because this function is also used in data type inference in
creating dataframe
+ from arrow/pandas data, in which the input data may use a different unit.
"""
import pyarrow.types as types
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]