paleolimbot opened a new pull request, #46772:
URL: https://github.com/apache/arrow/pull/46772
### Rationale for this change
The Parquet C++ implementation now supports reading four logical types
(JSON, UUID, Geometry, Geography) as Arrow extension types; however, users have
to opt-in to avoid loosing the logical type on read.
### What changes are included in this PR?
This PR sets the default value of `arrow_extensions_enabled` to `True` (in
Python).
### Are these changes tested?
Yes, the behaviour of `arrow_extensions_enabled` was already tested (and
tests were updated to reflect the new default value).
### Are there any user-facing changes?
**This PR includes breaking changes to public APIs.**
Reading Parquet files that contained a JSON or UUID logical type will now
have an extension type rather than string or fixed size binary, respectively.
Python users that were relying on the previous behaviour would have to
explicitly cast to storage after this PR:
```python
import uuid
import pyarrow as pa
json_array = pa.array(['{"k": "v"}'], pa.json_())
json_array.cast(pa.string())
#> [
#> "{"k": "v"}"
#> ]
uuid_array = pa.array([uuid.uuid4().bytes], pa.uuid())
uuid_array.cast(pa.binary(16))
#> <pyarrow.lib.FixedSizeBinaryArray object at 0x11e42b1c0>
#> [
#> 746C1022AB434A97972E1707EC3EE8F4
#> ]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]