aboderinsamuel opened a new pull request, #50145:
URL: https://github.com/apache/arrow/pull/50145

   ### Rationale for this change
   
   `FixedShapeTensorType.to_pandas_dtype()` inherited the base `DataType`
   implementation, which raises `NotImplementedError` for every extension type.
   This contradicted the documented public API, and it also blocked
   `Table.to_pandas(split_blocks=True)` for fixed-shape-tensor columns: with no
   pandas dtype available, the split-blocks path emitted an extension 
(`py_array`)
   block with no matching dtype and crashed with `KeyError` in
   `_reconstruct_block`.
   
   (See the discussion in #49907 and the related #33134 on how extension arrays
   should convert to pandas in general.)
   
   ### What changes are included in this PR?
   
   Implement `to_pandas_dtype()` on `FixedShapeTensorType` to return
   `pandas.ArrowDtype(self)`. `ArrowDtype` is a pandas `ExtensionDtype` that
   implements `__from_arrow__`, which is exactly what the `pandas_compat`
   extension-block path requires to build the column — so no conversion code
   needed to change.
   
   On pandas `< 1.5` (no `ArrowDtype`), the method falls back to raising
   `NotImplementedError`, leaving behavior on older pandas unchanged.
   
   ### Are these changes tested?
   
   Yes. `test_tensor_type_to_pandas` in
   `python/pyarrow/tests/test_extension_type.py` asserts that:
   - `to_pandas_dtype()` returns a `pd.ArrowDtype` wrapping the type,
   - `Array.to_pandas()` produces an ArrowDtype-backed column,
   - `Table.to_pandas()` with both `split_blocks=False` and `split_blocks=True`
     (the case from #49907) round-trips correctly,
   
   parametrized over value types (`int8`, `float32`, `float64`) and shapes
   including a permutation. It is gated to pandas `>= 2.1.0`, matching the
   existing `pd.ArrowDtype` extension-block tests (GH-35821).
   
   ### Are there any user-facing changes?
   
   Yes:
   - `FixedShapeTensorType.to_pandas_dtype()` now returns 
`pandas.ArrowDtype(...)`
     instead of raising `NotImplementedError` (pandas `>= 1.5`).
   - Consequently, `Array.to_pandas()` / `Table.to_pandas()` on a
     fixed-shape-tensor column now yield an `ArrowDtype`-backed column instead 
of
     an `object`-dtype column of flattened ndarrays (pandas `>= 2.1`). Code
     relying on the previous `object` dtype will observe this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to