kczimm opened a new pull request, #8071:
URL: https://github.com/apache/arrow-rs/pull/8071

   # Which issue does this PR close?
   
   - Closes #8070.
   
   # Rationale for this change
   
   This change introduces a more flexible way to handle page indexes (column 
and offset indexes) in Parquet files. Previously, the reading of these indexes 
was controlled by boolean flags, which indicated read required or do not read. 
The new `PageIndexPolicy` enum (`Off`, `Optional`, `Required`) provides finer 
control, allowing users to specify whether an index is not read, read if 
present (without error if missing), or strictly required (error if missing).
   
   # What changes are included in this PR?
   
   - Introduced a new `PageIndexPolicy` enum with `Off`, `Optional`, and 
`Required` variants.
   - Replaced the boolean `column_index` and `offset_index` fields in 
`ParquetMetaDataReader` with the new `PageIndexPolicy` enum.
   - Updated the `ParquetMetaDataReader::new()` function to initialize page 
index policies to `Off`, preserving previous defaults.
   - Modified existing `with_page_indexes`, `with_column_indexes`, and 
`with_offset_indexes` methods to utilize the new `PageIndexPolicy`, defaulting 
to `Required` when enabling indexes.
   - Added new methods: `with_page_index_policy`, `with_column_index_policy`, 
and `with_offset_index_policy` to allow direct setting of the page index policy.
   - Adjusted the internal logic for parsing column and offset indexes to 
respect the specified `PageIndexPolicy`, including returning an error if a 
`Required` index is not found.
   
   # Are these changes tested?
   
   Yes, a new test file `parquet/tests/page_index.rs` has been added to cover 
the functionality of the new `PageIndexPolicy` and its integration with 
`ParquetMetaDataReader`.
   
   # Are there any user-facing changes?
   
   Yes, there are user-facing changes to the `ParquetMetaDataReader` API. The 
`with_column_indexes` and `with_offset_indexes` methods now implicitly use 
`PageIndexPolicy::Required` when enabling page indexes. New methods 
`with_page_index_policy`, `with_column_index_policy`, and 
`with_offset_index_policy` have been added.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to