parker-cassar opened a new pull request, #50325:
URL: https://github.com/apache/arrow/pull/50325

   ### Rationale for this change
   
   Converting a Table with an `arrow.uuid` extension column to pandas currently 
produces a column of `bytes` instead of `uuid.UUID` objects. This happens 
because `UuidType` does not implement `to_pandas_dtype()`, so 
`Table.to_pandas()` falls back to the storage type (`fixed_size_binary(16)`) 
and produces `bytes`. The bug occurs even without a Parquet roundtrip.
   
   Note: the original issue suggested this might be specific to Python 3.14 but 
I tested on Python versions 3.10 - 3.14 and still had the issue since 
`UuidType` has never implemented `to_pandas_dtype()`. 
   
   ### What changes are included in this PR?
   
   Added `UuidType.to_pandas_dtype()`: returns a dtype wrapper implementing 
`__from_arrow__`, which delegates to `to_pylist()` since `UuidScalar.as_py()` 
already produces `uuid.UUID` objects.
   
   ### Are these changes tested?
   
   Yes. Added `test_uuid_roundtrip` which covers pandas DataFrame with a UUID 
column -> pyarrow Table -> Parquet on disk -> pyarrow Table -> pandas 
DataFrame. The final conversion is what this PR fixes.
   
   ### Are there any user-facing changes?
   
   Yes. `Table.to_pandas()` now returns `uuid.UUID` for `arrow.uuid` columns 
instead of `bytes`.
   
   * GitHub Issue: #50312


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to