Jeadie commented on code in PR #8071:
URL: https://github.com/apache/arrow-rs/pull/8071#discussion_r2258651235


##########
parquet/src/file/metadata/reader.rs:
##########
@@ -593,7 +642,15 @@ impl ParquetMetaDataReader {
                                     col_idx,
                                 )
                             }
-                            None => Err(general_err!("missing offset index")),
+                            None => {
+                                if self.offset_index == 
PageIndexPolicy::Required {
+                                    return Err(general_err!("missing offset 
index"));
+                                }
+                                Ok(OffsetIndexMetaData {

Review Comment:
   A question from naivetaty. What are the implication of `page_locations` 
being empty? i.e. What behaviour is assumed if this is empty? possibly
    1. there are no associated pages
    2. we have no pre-indexed idea about which pages are associated so we must 
calculate it ourselves.
   
   I've started looking at this, but it is convoluted.
   
   I feel the most correct approach would be to change the `ParquetOffsetIndex` 
to have Options, i.e.
   ```diff
   - pub type ParquetOffsetIndex = Vec<Vec<OffsetIndexMetaData>>;
   + pub type ParquetOffsetIndex = Vec<Vec<Option<OffsetIndexMetaData>>>; 
   ```
   
   This is a bit more involved, but semantically more correct (again from my 
understanding). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to