tustvold commented on issue #1111:
URL: https://github.com/apache/arrow-rs/issues/1111#issuecomment-1003153596


   So I'm not sure there is an easy way to fix this... `ArrowArrayReader` 
flattens all the pages from all the column chunks into iterators and then feeds 
these to `CompositeValueDecoder` which decode the levels and values 
independently. This makes it a non-trivial change to decode the levels and 
corresponding values from a given page in lock-step, which I believe is 
necessary in order to decode the correct number.
   
   Rather than spending time re-working `ArrowArrayReader` in order to fix this 
bug, I'm **personally** going to focus on getting #1082 and the PRs it builds 
on polished up. This provides an alternative implementation for reading byte 
arrays, that builds on the existing `ColumnReaderImpl` and `RecordReader` logic 
and so, much like `PrimitiveArrayReader`, does not run into this bug. My hope 
is that by being both faster, and duplicating less code, it will make sense to 
swap out `ArrowArrayReader` and therefore fix this bug for anything not using 
`ArrowArrayReader` explicitly. 
   
   If someone else wishes to work on fixing `ArrowArrayReader` that would be 
brilliant, but I'm going to focus my efforts elsewhere.
   
   FYI @yordan-pavlov @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to