kczimm opened a new pull request, #8071: URL: https://github.com/apache/arrow-rs/pull/8071
# Which issue does this PR close? - Closes #8070. # Rationale for this change This change introduces a more flexible way to handle page indexes (column and offset indexes) in Parquet files. Previously, the reading of these indexes was controlled by boolean flags, which indicated read required or do not read. The new `PageIndexPolicy` enum (`Off`, `Optional`, `Required`) provides finer control, allowing users to specify whether an index is not read, read if present (without error if missing), or strictly required (error if missing). # What changes are included in this PR? - Introduced a new `PageIndexPolicy` enum with `Off`, `Optional`, and `Required` variants. - Replaced the boolean `column_index` and `offset_index` fields in `ParquetMetaDataReader` with the new `PageIndexPolicy` enum. - Updated the `ParquetMetaDataReader::new()` function to initialize page index policies to `Off`, preserving previous defaults. - Modified existing `with_page_indexes`, `with_column_indexes`, and `with_offset_indexes` methods to utilize the new `PageIndexPolicy`, defaulting to `Required` when enabling indexes. - Added new methods: `with_page_index_policy`, `with_column_index_policy`, and `with_offset_index_policy` to allow direct setting of the page index policy. - Adjusted the internal logic for parsing column and offset indexes to respect the specified `PageIndexPolicy`, including returning an error if a `Required` index is not found. # Are these changes tested? Yes, a new test file `parquet/tests/page_index.rs` has been added to cover the functionality of the new `PageIndexPolicy` and its integration with `ParquetMetaDataReader`. # Are there any user-facing changes? Yes, there are user-facing changes to the `ParquetMetaDataReader` API. The `with_column_indexes` and `with_offset_indexes` methods now implicitly use `PageIndexPolicy::Required` when enabling page indexes. New methods `with_page_index_policy`, `with_column_index_policy`, and `with_offset_index_policy` have been added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org