Ted-Jiang opened a new pull request, #1762:
URL: https://github.com/apache/arrow-rs/pull/1762
# Which issue does this PR close?
Closes #1761 .
# Rationale for this change
Get this info in memory then we can apply page-level filter in future.
# What changes are included in this PR?
Add an option to read page index in `parquet/src/file/serialized_reader.rs`.
# Are there any user-facing changes?
In `parquet-testing` only `data_index_bloom_encoding_stats.parquet` has one
`rowgroup` with pageIndex.
I will generate test file base on `alltypes_plain.parquet` (this file not
contains any pageindex) in repo `parquet-testing`, and support multi-RG in Next
Pr.
```
parquet-tools column-index ./alltypes_plain.parquet
row group 0:
column index for column id:
NONE
offset index for column id:
NONE
.
.
.
column index for column timestamp_col:
NONE
offset index for column timestamp_col:
```
<!---
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
<!---
If there are any breaking changes to public APIs, please add the `breaking
change` label.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]