[GitHub] [arrow-rs] yordan-pavlov edited a comment on issue #1111: ArrowArrayReader Reads Too Many Values From Bit-Packed Runs

GitBox Thu, 30 Dec 2021 12:54:36 -0800


yordan-pavlov edited a comment on issue #1111:
URL: https://github.com/apache/arrow-rs/issues/1111#issuecomment-1003176898



   @tustvold 's work on making the "old" arrow reader faster looks promising, 
plus I've hit a wall in making the `ArrowArrayReader` faster in some cases, so 
I agree that it doesn't make sense to try to fix the `ArrowArrayReader` and 
instead it makes more sense to focus on finishing the work in #1082 .
   
   `ArrowArrayReader` does use def levels to calculate how many values to read, 
see here 
https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/arrow_array_reader.rs#L576
 , but reading def levels is done independently of reading of values (one of 
the main characteristics of `ArrowArrayReader`) and so if the value of 
`num_values` field from `Page` is incorrect and does indeed include NULL 
values, then the entire idea of `ArrowArrayReader` falls apart and it won't 
work correctly. It makes me wonder how the unit tests pass though, I will have 
a look at this in the next few days.
   
   One reason why I implemented `ArrowArrayReader` in a new struct so that it 
can easily be swapped out if was necessary, so I guess that time has come. Yes 
- it will result in a significant reduction in performance of reading string 
arrays, but correctness is more important than speed and also it would be only 
until #1082 is finished.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] yordan-pavlov edited a comment on issue #1111: ArrowArrayReader Reads Too Many Values From Bit-Packed Runs

Reply via email to