Jeadie commented on code in PR #8071: URL: https://github.com/apache/arrow-rs/pull/8071#discussion_r2258651235
########## parquet/src/file/metadata/reader.rs: ########## @@ -593,7 +642,15 @@ impl ParquetMetaDataReader { col_idx, ) } - None => Err(general_err!("missing offset index")), + None => { + if self.offset_index == PageIndexPolicy::Required { + return Err(general_err!("missing offset index")); + } + Ok(OffsetIndexMetaData { Review Comment: A question from naivetaty. What are the implication of `page_locations` being empty? i.e. What behaviour is assumed if this is empty? possibly 1. there are no associated pages 2. we have no pre-indexed idea about which pages are associated so we must calculate it ourselves. I've started looking at this, but it is convoluted. I feel the most correct approach would be to change the `ParquetOffsetIndex` to have Options, i.e. ```diff - pub type ParquetOffsetIndex = Vec<Vec<OffsetIndexMetaData>>; + pub type ParquetOffsetIndex = Vec<Vec<Option<OffsetIndexMetaData>>>; ``` This is a bit more involved, but semantically more correct (again from my understanding). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org