parker-cassar opened a new pull request, #50325: URL: https://github.com/apache/arrow/pull/50325
### Rationale for this change Converting a Table with an `arrow.uuid` extension column to pandas currently produces a column of `bytes` instead of `uuid.UUID` objects. This happens because `UuidType` does not implement `to_pandas_dtype()`, so `Table.to_pandas()` falls back to the storage type (`fixed_size_binary(16)`) and produces `bytes`. The bug occurs even without a Parquet roundtrip. Note: the original issue suggested this might be specific to Python 3.14 but I tested on Python versions 3.10 - 3.14 and still had the issue since `UuidType` has never implemented `to_pandas_dtype()`. ### What changes are included in this PR? Added `UuidType.to_pandas_dtype()`: returns a dtype wrapper implementing `__from_arrow__`, which delegates to `to_pylist()` since `UuidScalar.as_py()` already produces `uuid.UUID` objects. ### Are these changes tested? Yes. Added `test_uuid_roundtrip` which covers pandas DataFrame with a UUID column -> pyarrow Table -> Parquet on disk -> pyarrow Table -> pandas DataFrame. The final conversion is what this PR fixes. ### Are there any user-facing changes? Yes. `Table.to_pandas()` now returns `uuid.UUID` for `arrow.uuid` columns instead of `bytes`. * GitHub Issue: #50312 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
