tustvold commented on issue #3029: URL: https://github.com/apache/arrow-rs/issues/3029#issuecomment-1305076395
I've found the underlying cause of this is an accounting bug in `RLEDecoder::get_batch_with_dict` In particular if the runs are longer than 1024, it may try to read more values from the underlying bit reader than there is capacity for. If the actual number of values is not a multiple of 8, this will return more values, as the length of bit packed runs is actually ambiguous. Such a scenario will result in a panic when it tries to copy these values across. Will post a PR to fix shortly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
