swanandx opened a new issue, #9705: URL: https://github.com/apache/arrow-rs/issues/9705
**Describe the bug** <!-- A clear and concise description of what the bug is. --> Reading parquet files with corrupted data leads to panic due to: https://github.com/apache/arrow-rs/blob/88b7fca2304b07678d4543179946ddd032d31d45/parquet/src/file/metadata/reader.rs#L533 **To Reproduce** <!-- Steps to reproduce the behavior: --> couldn't write a minimal PoC, but here is stacktrace ``` thread 'xx' (53) panicked at /home/ubuntu/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/parquet-57.3.0/src/file/metadata/reader.rs:535:17: assertion failed: end <= remainder.len() stack backtrace: 0: __rustc::rust_begin_unwind 1: core::panicking::panic_fmt 2: core::panicking::panic 3: <datafusion_datasource_parquet::reader::CachedParquetFileReader as parquet::arrow::async_reader::AsyncFileReader>::get_metadata::{{closure}} 4: <datafusion_datasource_parquet::opener::ParquetOpener as datafusion_datasource::file_stream::FileOpener>::open::{{closure}} 5: <datafusion_datasource::file_stream::FileStream as futures_core::stream::Stream>::poll_next 6: <datafusion_physical_plan::coop::CooperativeStream<T> as futures_core::stream::Stream>::poll_next 7: <datafusion_physical_plan::stream::BatchSplitStream as futures_core::stream::Stream>::poll_next ``` we had a deltalake, I corrupted one of the parquet files with: ```py python3 -c " data = open('/tmp/original.parquet','rb').read() total = len(data) # Keep first 30% of data + last 1000 bytes (footer) head = data[:int(total * 0.3)] foot = data[-1000:] # footer is small, 846 bytes per metadata output open('/tmp/corrupt.parquet','wb').write(head + foot) print(f'Original: {total} -> Corrupt: {len(head) + len(foot)} bytes') " ``` that lead to the crash **Expected behavior** It should return `ParquetError` instead of panic **Additional context** version: `parquet-57.3.0` [ latest would fail as well ] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
