albertlockett opened a new pull request, #8573:
URL: https://github.com/apache/arrow-rs/pull/8573

   # Which issue does this PR close?
   
   - Closes #8404
   
   # Rationale for this change
   
   A regression was reported in issue #8404 which was introduced in 
https://github.com/apache/arrow-rs/pull/7585. This PR resolves the issue.
   
   # What changes are included in this PR?
   
   The root cause of the issue was that the behaviour of 
`ByteArrayDictionaryReader` is to return a new empty length array of values if 
the record reader has already been consumed. The problem was that the 
repetition and definition level buffers were not being advanced in this early 
return case.
   
   
https://github.com/apache/arrow-rs/blob/521f219e308613811aeae11300bf7a7b0fb5ec29/parquet/src/arrow/array_reader/byte_array_dictionary.rs#L167-L183
   
   The `StructArrayReader` reads the repetition and definition levels from the 
first child to determine the nullability of the struct array. When we returned 
the empty values buffer for the child, without advancing the repetition and 
definition buffers, the `StructArrayReader` a length mismatch between the empty 
child array and the non-empty nullability bitmask, and this produces the error.
   
   
https://github.com/apache/arrow-rs/blob/521f219e308613811aeae11300bf7a7b0fb5ec29/parquet/src/arrow/array_reader/struct_array.rs#L137-L170
   
   The fix is simple, always have `ByteArrayDictionaryReader` advance the 
repetition and definition level buffers when `consume_next_batch` is called.
   
   # Are these changes tested?
   
   Yes, a new unit test was added 
`test_read_nullable_structs_with_binary_dict_as_first_child_column`, which 
before the changes introduced in this PR would replicate the issue.
   
   # Are there any user-facing changes?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to