nevi-me commented on pull request #8829: URL: https://github.com/apache/arrow/pull/8829#issuecomment-743885020
Hi @jorgecarleitao, I also spent most of today looking into this :( The problem was a bit vs byte issue on `consume_record_data` and `consume_rep_levels`, so what was mostly needed was a fresh pair of eyes. I opened https://github.com/jorgecarleitao/arrow/pull/22. I didn't address the `rustfmt` lint, so you could see what I changed. > I verified that in all 4 tests the content passed to `ColumnReaderImpl::read_batch` is the same in master and after this PR, but for some reason the outcome of that call is different, which hints that there is some other invariant beyond the ones that we are telling the compiler about through the typing and lifetime system. This was hidden away by the way that we compare arrays. Because we only print the first 10 & last 10 values of arrays, the issue wasn't visible, but comparing `ArrayData` surfaced the issue with the failing tests. I had to change the test utilities that generate data to return sequential values instead of randomly generated that. Then I was able to see why `consume_record_data` was filling the first 12 and last 12 slots with data (then the part that we don't print was just 0s). I also changed the code slightly to rather initialise the buffers with `let new_buffer = MutableBuffer::new(0)` because we always perform an allocation on `new_buffer.resize(num_bytes)` as we never get the initial allocation right anyways. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
