etseidl commented on code in PR #8071: URL: https://github.com/apache/arrow-rs/pull/8071#discussion_r2261631007
########## parquet/src/file/metadata/reader.rs: ########## @@ -105,7 +126,15 @@ impl FooterTail { impl ParquetMetaDataReader { /// Create a new [`ParquetMetaDataReader`] pub fn new() -> Self { - Default::default() + Self { Review Comment: Is this change necessary? Won't the policies default to `Skip` anyway? ########## parquet/src/file/metadata/reader.rs: ########## @@ -593,7 +642,15 @@ impl ParquetMetaDataReader { col_idx, ) } - None => Err(general_err!("missing offset index")), + None => { Review Comment: This change here is the heart of the PR. The thing that gives me pause is simply replacing missing indexes with empty vectors doesn't let the user know that the indexes are in a potentially unusable state. Are the indexes missing for a single column chunk? An entire row group? We can't really tell without doing a validation step after decoding is complete. I think if we move forward with this, I'd prefer rather than inserting invalid indexes, we instead invalidate the entire page index (i.e. set column and offset index back to `None` in the `ParquetMetaData`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org