etseidl commented on code in PR #8071:
URL: https://github.com/apache/arrow-rs/pull/8071#discussion_r2261631007


##########
parquet/src/file/metadata/reader.rs:
##########
@@ -105,7 +126,15 @@ impl FooterTail {
 impl ParquetMetaDataReader {
     /// Create a new [`ParquetMetaDataReader`]
     pub fn new() -> Self {
-        Default::default()
+        Self {

Review Comment:
   Is this change necessary? Won't the policies default to `Skip` anyway?



##########
parquet/src/file/metadata/reader.rs:
##########
@@ -593,7 +642,15 @@ impl ParquetMetaDataReader {
                                     col_idx,
                                 )
                             }
-                            None => Err(general_err!("missing offset index")),
+                            None => {

Review Comment:
   This change here is the heart of the PR. The thing that gives me pause is 
simply replacing missing indexes with empty vectors doesn't let the user know 
that the indexes are in a potentially unusable state. Are the indexes missing 
for a single column chunk? An entire row group? We can't really tell without 
doing a validation step after decoding is complete. 
   
   I think if we move forward with this, I'd prefer rather than inserting 
invalid indexes, we instead invalidate the entire page index (i.e. set column 
and offset index back to `None` in the `ParquetMetaData`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to