Copilot commented on code in PR #50145:
URL: https://github.com/apache/arrow/pull/50145#discussion_r3386565216
##########
python/pyarrow/types.pxi:
##########
@@ -2049,6 +2049,30 @@ cdef class FixedShapeTensorType(BaseExtensionType):
else:
return None
+ def to_pandas_dtype(self):
+ """
+ Return the equivalent pandas dtype, an instance of
+ :class:`pandas.ArrowDtype` wrapping this extension type.
+
+ Each value of the resulting pandas column is a tensor with this
+ type's ``shape``. Returning a pandas extension dtype (rather than a
+ NumPy dtype) is what lets ``Table.to_pandas(split_blocks=True)``
+ build an extension block for this type.
+
+ Examples
+ --------
+ >>> import pyarrow as pa
+ >>> pa.fixed_shape_tensor(pa.int32(), [2, 2]).to_pandas_dtype()
+ extension<arrow.fixed_shape_tensor[value_type=int32,
shape=[2,2]]>[pyarrow]
+ """
+ import pandas as pd
+ if not hasattr(pd, "ArrowDtype"):
+ # pandas < 1.5 has no ArrowDtype able to hold tensors, so keep the
+ # documented fallback. Conversion code catches this and produces an
+ # object-dtype column instead.
+ raise NotImplementedError(str(self))
+ return pd.ArrowDtype(self)
Review Comment:
The new behavior is gated only on `hasattr(pd, "ArrowDtype")`, but the added
test suite explicitly skips pandas `< 2.1.0` because ArrowDtype extension
blocks are “only reliable from 2.1.0”. This mismatch means users on pandas
1.5–2.0 will now get `pd.ArrowDtype(self)` even though the test suite suggests
behavior may be unreliable there. Consider aligning runtime behavior with the
tested support window (e.g., raise the same fallback `NotImplementedError` for
pandas versions below the minimum reliable version, or add a more precise
feature/version check that matches the motivation for the skip).
##########
python/pyarrow/types.pxi:
##########
@@ -2049,6 +2049,30 @@ cdef class FixedShapeTensorType(BaseExtensionType):
else:
return None
+ def to_pandas_dtype(self):
+ """
+ Return the equivalent pandas dtype, an instance of
+ :class:`pandas.ArrowDtype` wrapping this extension type.
+
+ Each value of the resulting pandas column is a tensor with this
+ type's ``shape``. Returning a pandas extension dtype (rather than a
+ NumPy dtype) is what lets ``Table.to_pandas(split_blocks=True)``
+ build an extension block for this type.
+
+ Examples
+ --------
+ >>> import pyarrow as pa
+ >>> pa.fixed_shape_tensor(pa.int32(), [2, 2]).to_pandas_dtype()
+ extension<arrow.fixed_shape_tensor[value_type=int32,
shape=[2,2]]>[pyarrow]
+ """
+ import pandas as pd
+ if not hasattr(pd, "ArrowDtype"):
+ # pandas < 1.5 has no ArrowDtype able to hold tensors, so keep the
+ # documented fallback. Conversion code catches this and produces an
+ # object-dtype column instead.
+ raise NotImplementedError(str(self))
Review Comment:
`NotImplementedError(str(self))` is hard to interpret for users because it
doesn’t explain why pandas dtype conversion isn’t supported (missing
`ArrowDtype` vs. pandas too old vs. missing pandas). Consider raising
`NotImplementedError` with a message that states the required pandas
feature/version (and optionally the detected pandas version) so callers can
take action without needing to infer it from the type string.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]