pitrou commented on code in PR #36290:
URL: https://github.com/apache/arrow/pull/36290#discussion_r1254402322
##########
python/pyarrow/_parquet.pyx:
##########
@@ -1599,6 +1610,14 @@ cdef shared_ptr[WriterProperties]
_create_writer_properties(
# a size larger than this then it will be latched to this value.
props.max_row_group_length(_MAX_ROW_GROUP_SIZE)
+ # page index
+
+ if isinstance(write_page_index, bool):
Review Comment:
Why ignore the value if it's not boolean? This makes the API confusing.
##########
python/pyarrow/tests/parquet/test_metadata.py:
##########
@@ -357,6 +357,20 @@ def test_field_id_metadata():
assert schema[5].metadata[field_id] == b'-1000'
+def test_parquet_file_page_index():
+ table = pa.table({'a': [1, 2, 3]})
+
+ writer = pa.BufferOutputStream()
+ _write_table(table, writer, write_page_index=True)
+ reader = pa.BufferReader(writer.getvalue())
+
+ # Can retrieve sorting columns from metadata
+ metadata = pq.read_metadata(reader)
+ cc = metadata.row_group(0).column(0)
+ assert cc.has_offset_index is True
+ assert cc.has_column_index is True
Review Comment:
Ok, but can we also have a test where these properties are false?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]