[GitHub] [arrow-rs] tustvold edited a comment on issue #1111: ArrowArrayReader Reads Too Many Values From Bit-Packed Runs

GitBox Thu, 30 Dec 2021 11:30:59 -0800


tustvold edited a comment on issue #1111:
URL: https://github.com/apache/arrow-rs/issues/1111#issuecomment-1003153596



   So I'm not sure there is an easy way to fix this... `ArrowArrayReader` 
flattens all the pages from all the column chunks into iterators and then feeds 
these to `CompositeValueDecoder` which decode the levels and values 
independently. This makes it a non-trivial change to decode the levels and 
corresponding values from a given page in lock-step, which I believe is 
necessary in order to decode the correct number.
   
   Rather than spending time re-working `ArrowArrayReader` in order to fix this 
bug, I'm **personally** going to focus on getting #1082 and the PRs it builds 
on polished up. This provides an alternative implementation for reading byte 
arrays, that builds on the existing `ColumnReaderImpl` and `RecordReader` logic 
and so, much like `PrimitiveArrayReader`, does not run into this bug. My hope 
is that by being both faster, and duplicating less code, it will make sense to 
swap out `ArrowArrayReader` and therefore fix this bug for anything not using 
`ArrowArrayReader` explicitly. 
   
   If someone else wishes to work on fixing `ArrowArrayReader` that would be 
brilliant, but I'm going to focus my efforts elsewhere.
   
   FYI @yordan-pavlov @alamb 
   
   Edit: In the short-term switching back to `ComplexObjectArrayReader` does 
fix the bug, but represents a non-trivial performance regression (up to 6x) and 
so I'm somewhat loathe to suggest it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold edited a comment on issue #1111: ArrowArrayReader Reads Too Many Values From Bit-Packed Runs

Reply via email to