Re: [PR] fix: ParquetError when reading corrupt parquet file with truncated data instead of Panic [arrow-rs]

via GitHub Thu, 16 Apr 2026 05:35:35 -0700


alamb commented on code in PR #9725:
URL: https://github.com/apache/arrow-rs/pull/9725#discussion_r3093197826



##########
parquet/src/file/metadata/reader.rs:
##########
@@ -1422,4 +1434,81 @@ mod async_tests {
         read_and_check(f.as_file(), PageIndexPolicy::Optional).unwrap();
         read_and_check(f.as_file(), PageIndexPolicy::Skip).unwrap();
     }
+
+    /// Test that corrupted parquet files return ParquetError instead of 
panicking

Review Comment:
   I think this test is too specific and will be hard to maintain over the long 
run as the programatic generation of bad data will be brittle (if we change how 
the thrift is written, for example, the truncation may go down a different 
path). 
   
   If we want to test this type of error condition I think we should either:
   1. Check in a broken parquet file
   2. (better) implement a fuzzer that breaks the parquet file in random ways 
(maybe truncates it arbitrarily, etc) and ensures there are no panics
   
   But for this PR if we want coverage I recommend we check in the bad file 
rather than trying to generate it programatically. We can consider parquet fuzz 
tsting as a follow on 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: ParquetError when reading corrupt parquet file with truncated data instead of Panic [arrow-rs]

Reply via email to