jhorstmann opened a new issue, #9434:
URL: https://github.com/apache/arrow-rs/issues/9434

   **Describe the bug**
   
   The `RleDecoder::get_batch_with_dict` function panics when it encounters 
dictionary indices that are out of bounds.
   
   I created two sample files that trigger this, one with an rle-encoded 
dictionary key, one with a bitpacked key:
   
   
[oob_bitpacked_value.zip](https://github.com/user-attachments/files/25393056/oob_bitpacked_value.zip)
   
[oob_rle_value.zip](https://github.com/user-attachments/files/25393055/oob_rle_value.zip)
   
   **To Reproduce**
   ```
   $ parquet-read oob_bitpacked_value.parquet
   
   thread 'main' (228557) panicked at parquet/src/encodings/rle.rs:500:58:
   index out of bounds: the len is 1 but the index is 18446744073709551487
   ```
   
   ```
   $ parquet-read oob_rle_value.parquet
   
   thread 'main' (228807) panicked at parquet/src/encodings/rle.rs:468:34:
   index out of bounds: the len is 1 but the index is 4294967167
   ```
   **Expected behavior**
   
   Reading these invalid files should return an `Result::Err` instead of 
panicking.
   
   **Additional context**
   
   A fix is already in progress in #9365, these files could be added there as a 
unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to