paleolimbot opened a new pull request, #46772:
URL: https://github.com/apache/arrow/pull/46772

   ### Rationale for this change
   
   The Parquet C++ implementation now supports reading four logical types 
(JSON, UUID, Geometry, Geography) as Arrow extension types; however, users have 
to opt-in to avoid loosing the logical type on read.
   
   ### What changes are included in this PR?
   
   This PR sets the default value of `arrow_extensions_enabled` to `True` (in 
Python).
   
   ### Are these changes tested?
   
   Yes, the behaviour of `arrow_extensions_enabled` was already tested (and 
tests were updated to reflect the new default value).
   
   ### Are there any user-facing changes?
   
   **This PR includes breaking changes to public APIs.**
   
   Reading Parquet files that contained a JSON or UUID logical type will now 
have an extension type rather than string or fixed size binary, respectively. 
Python users that were relying on the previous behaviour would have to 
explicitly cast to storage after this PR:
   
   ```python
   import uuid
   import pyarrow as pa
   
   json_array = pa.array(['{"k": "v"}'], pa.json_())
   json_array.cast(pa.string())
   #> [
   #>   "{"k": "v"}"
   #> ]
   
   uuid_array = pa.array([uuid.uuid4().bytes], pa.uuid())
   uuid_array.cast(pa.binary(16))
   #> <pyarrow.lib.FixedSizeBinaryArray object at 0x11e42b1c0>
   #> [
   #>   746C1022AB434A97972E1707EC3EE8F4
   #> ]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to