etseidl commented on code in PR #6466:
URL: https://github.com/apache/arrow-rs/pull/6466#discussion_r1779842671


##########
parquet/src/file/serialized_reader.rs:
##########
@@ -204,41 +206,30 @@ impl<R: 'static + ChunkReader> SerializedFileReader<R> {
                 }
             }
             if keep {
-                filtered_row_groups.push(rg_meta);
+                metadata_builder = metadata_builder.add_row_group(rg_meta);
             }
         }
 
         if options.enable_page_index {
             let mut columns_indexes = vec![];
             let mut offset_indexes = vec![];
 
-            for rg in &mut filtered_row_groups {
+            for rg in metadata_builder.row_groups().iter() {

Review Comment:
   I mean replace
   ```rust
           if options.enable_page_index {
               let mut columns_indexes = vec![];
               let mut offset_indexes = vec![];
   
               for rg in metadata_builder.row_groups().iter() {
                   let column_index = 
index_reader::read_columns_indexes(&chunk_reader, rg.columns())?;
                   let offset_index = 
index_reader::read_offset_indexes(&chunk_reader, rg.columns())?;
                   columns_indexes.push(column_index);
                   offset_indexes.push(offset_index);
               }
               metadata_builder = metadata_builder
                   .set_column_index(Some(columns_indexes))
                   .set_offset_index(Some(offset_indexes));
           }
   ```
   with the above code snippet from my earlier comment. This should be a bit 
more efficient since `read_page_indexes` will fetch the necessary bytes from 
the file in a single read, rather than 2 reads per row group.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to