rok commented on PR #44070:
URL: https://github.com/apache/arrow/pull/44070#issuecomment-2428700815
@pitrou
> What's the plan for the Parquet `arrow_extensions_enabled` option?
Perhaps we should open another issue for it? Current implementation seems to
roundtrip to parquet ok.
I'd propose something like this:
```diff
diff --git a/python/pyarrow/_parquet.pxd b/python/pyarrow/_parquet.pxd
index d6aebd8284..32e2618ecf 100644
--- a/python/pyarrow/_parquet.pxd
+++ b/python/pyarrow/_parquet.pxd
@@ -405,6 +405,7 @@ cdef extern from "parquet/api/reader.h" namespace
"parquet" nogil:
CCacheOptions cache_options() const
void set_coerce_int96_timestamp_unit(TimeUnit unit)
TimeUnit coerce_int96_timestamp_unit() const
+ void set_arrow_extensions_enabled(c_bool enabled)
ArrowReaderProperties default_arrow_reader_properties()
diff --git a/python/pyarrow/_parquet.pyx b/python/pyarrow/_parquet.pyx
index 254bfe3b09..6ae1726c71 100644
--- a/python/pyarrow/_parquet.pyx
+++ b/python/pyarrow/_parquet.pyx
@@ -1441,7 +1441,8 @@ cdef class ParquetReader(_Weakrefable):
FileDecryptionProperties decryption_properties=None,
thrift_string_size_limit=None,
thrift_container_size_limit=None,
- page_checksum_verification=False):
+ page_checksum_verification=False,
+ arrow_extensions_enabled=False):
"""
Open a parquet file for reading.
@@ -1458,6 +1459,7 @@ cdef class ParquetReader(_Weakrefable):
thrift_string_size_limit : int, optional
thrift_container_size_limit : int, optional
page_checksum_verification : bool, default False
+ arrow_extensions_enabled: bool, default False
"""
cdef:
shared_ptr[CFileMetaData] c_metadata
@@ -1522,6 +1524,9 @@ cdef class ParquetReader(_Weakrefable):
if read_dictionary is not None:
self._set_read_dictionary(read_dictionary, &arrow_props)
+ if arrow_extensions_enabled:
+ arrow_props.set_arrow_extensions_enabled(<c_bool>True)
+
with nogil:
check_status(builder.memory_pool(self.pool)
.properties(arrow_props)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]