yordan-pavlov commented on a change in pull request #1130:
URL: https://github.com/apache/arrow-rs/pull/1130#discussion_r777699023



##########
File path: parquet/src/arrow/arrow_array_reader.rs
##########
@@ -420,6 +456,24 @@ impl<'a, C: ArrayConverter + 'a> ArrowArrayReader<'a, C> {
         }
     }
 
+    fn count_def_level_values(
+        column_desc: &ColumnDescriptor,
+        level_decoder: crate::encodings::levels::LevelDecoder,
+        num_values: usize,
+    ) -> Result<usize> {
+        let mut def_level_decoder = LevelValueDecoder::new(level_decoder);
+        let def_level_array =

Review comment:
       Yes - def levels are decoded a second time for this fix and an i16 array 
and a boolean array are created to count the non-null values, but they only 
live for a very short time and the negative effect on performance is 
surprisingly small (3% to 8% in my benchmark run) see here: 
https://github.com/apache/arrow-rs/issues/1111#issuecomment-1003718555 ; even 
after this change the `ArrowArrayReader` is still often several times faster 
for decoding strings compared to the old `ArrayReader`, this hasn't changed 
much.
   
   It's probably possible to make this more efficient, but it would require 
more thinking and more time for a bigger change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to