alamb opened a new issue, #9742:
URL: https://github.com/apache/arrow-rs/issues/9742

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   In general, this crate should error on invalid data rather than panic
   
   
https://github.com/apache/arrow-rs?tab=readme-ov-file#guidelines-for-panic-vs-result
   
   > For those caused by invalid user input, however, we prefer to report that 
invalidity gracefully as an error result instead of panicking. In general, 
invalid input should result in an Error as soon as possible.
   
   
   However, we keep hitting various paths in parquet where there are panics
   - https://github.com/apache/arrow-rs/pull/9725
   
   Given these paths require a corrupt / invalid datasource, it is hard to 
write tests for them
   
   For example, here is a test that @xuzifu666 added for one such error: 
https://github.com/apache/arrow-rs/pull/9725/commits/0bb99427f62f36fbeaf680c62265b200881c1549
   
   However, I thought it would be hard to maintain over the long run as the 
programatic generation of bad data will be brittle (if we change how the thrift 
is written, for example, the truncation may go down a different path).
   
   **Describe the solution you'd like**
   
   I think we should consider some sort of parquet fuzzer that makes randomly 
bad data and ensures that the reader is returning error (not `panic`ing). It 
would be nice if it made some parqut files and then applied common data 
corruption:
   1. Truncate the data (remove bytes from end of the file)
   2. Truncate the data (remove bytesof the start of the file)
   3. Switch a random bit
   4. Set a random range of the file to all zeros
   
   There are probably other good ones we can do
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   
   **Additional context**
   Related to 
   - https://github.com/apache/arrow-rs/issues/5332
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to