yordan-pavlov commented on a change in pull request #1130:
URL: https://github.com/apache/arrow-rs/pull/1130#discussion_r777699023
##########
File path: parquet/src/arrow/arrow_array_reader.rs
##########
@@ -420,6 +456,24 @@ impl<'a, C: ArrayConverter + 'a> ArrowArrayReader<'a, C> {
}
}
+ fn count_def_level_values(
+ column_desc: &ColumnDescriptor,
+ level_decoder: crate::encodings::levels::LevelDecoder,
+ num_values: usize,
+ ) -> Result<usize> {
+ let mut def_level_decoder = LevelValueDecoder::new(level_decoder);
+ let def_level_array =
Review comment:
Yes - def levels are decoded a second time for this fix and an i16 array
and a boolean array are created to count the non-null values, but they only
live for a very short time and the negative effect on performance is
surprisingly small (3% to 8% in my benchmark run) see here:
https://github.com/apache/arrow-rs/issues/1111#issuecomment-1003718555 ; even
after this change the `ArrowArrayReader` is still often several times faster
for decoding strings compared to the old `ArrayReader`, this hasn't changed
much.
It's probably possible to make this more efficient, but it would require
more thinking and more time for a bigger change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]