alamb opened a new issue, #9742: URL: https://github.com/apache/arrow-rs/issues/9742
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** In general, this crate should error on invalid data rather than panic https://github.com/apache/arrow-rs?tab=readme-ov-file#guidelines-for-panic-vs-result > For those caused by invalid user input, however, we prefer to report that invalidity gracefully as an error result instead of panicking. In general, invalid input should result in an Error as soon as possible. However, we keep hitting various paths in parquet where there are panics - https://github.com/apache/arrow-rs/pull/9725 Given these paths require a corrupt / invalid datasource, it is hard to write tests for them For example, here is a test that @xuzifu666 added for one such error: https://github.com/apache/arrow-rs/pull/9725/commits/0bb99427f62f36fbeaf680c62265b200881c1549 However, I thought it would be hard to maintain over the long run as the programatic generation of bad data will be brittle (if we change how the thrift is written, for example, the truncation may go down a different path). **Describe the solution you'd like** I think we should consider some sort of parquet fuzzer that makes randomly bad data and ensures that the reader is returning error (not `panic`ing). It would be nice if it made some parqut files and then applied common data corruption: 1. Truncate the data (remove bytes from end of the file) 2. Truncate the data (remove bytesof the start of the file) 3. Switch a random bit 4. Set a random range of the file to all zeros There are probably other good ones we can do **Describe alternatives you've considered** <!-- A clear and concise description of any alternative solutions or features you've considered. --> **Additional context** Related to - https://github.com/apache/arrow-rs/issues/5332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
