bjchambers commented on a change in pull request #1246:
URL: https://github.com/apache/arrow-rs/pull/1246#discussion_r794993876



##########
File path: parquet/src/arrow/array_reader.rs
##########
@@ -214,6 +214,10 @@ where
         // save definition and repetition buffers
         self.def_levels_buffer = self.record_reader.consume_def_levels()?;
         self.rep_levels_buffer = self.record_reader.consume_rep_levels()?;
+
+        // Must consume bitmap buffer
+        self.record_reader.consume_bitmap_buffer()?;

Review comment:
       One place it comes up is where you have multiple Parquet files 
representing a "table", with one or more columns being relatively sparse. If 
you have a file dropped every hour, then it may be that some of the files have 
a null column while others have a few values.
   
   It doesn't seem likely it would be the bottleneck (the other columns in the 
file probably would be), but that's at least how it's come up for us.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to