nevi-me commented on pull request #8829:
URL: https://github.com/apache/arrow/pull/8829#issuecomment-743885020


   Hi @jorgecarleitao, I also spent most of today looking into this :( 
   
   The problem was a bit vs byte issue on `consume_record_data` and 
`consume_rep_levels`, so what was mostly needed was a fresh pair of eyes. I 
opened https://github.com/jorgecarleitao/arrow/pull/22. I didn't address the 
`rustfmt` lint, so you could see what I changed.
   
   > I verified that in all 4 tests the content passed to 
`ColumnReaderImpl::read_batch` is the same in master and after this PR, but for 
some reason the outcome of that call is different, which hints that there is 
some other invariant beyond the ones that we are telling the compiler about 
through the typing and lifetime system.
   
   This was hidden away by the way that we compare arrays. Because we only 
print the first 10 & last 10 values of arrays, the issue wasn't visible, but 
comparing `ArrayData` surfaced the issue with the failing tests. I had to 
change the test utilities that generate data to return sequential values 
instead of randomly generated that. Then I was able to see why 
`consume_record_data` was filling the first 12 and last 12 slots with data 
(then the part that we don't print was just 0s).
   
   I also changed the code slightly to rather initialise the buffers with `let 
new_buffer = MutableBuffer::new(0)` because we always perform an allocation on 
`new_buffer.resize(num_bytes)` as we never get the initial allocation right 
anyways.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to