pchintar opened a new issue, #9801: URL: https://github.com/apache/arrow-rs/issues/9801
## Description The IPC format specifies that compressed buffers are encoded as: > `[8 bytes uncompressed length] + compressed data` The current implementation assumes this invariant when reading the prefix during decompression. However, in the reader path, buffers are constructed from metadata (`offset`, `length`) and passed to the decompression logic without validating that they contain at least the required 8-byte prefix. This means that malformed or truncated IPC input can reach the decompression path with buffers shorter than 8 bytes. --- ## Problem The decompression logic reads the prefix unconditionally: ```rust let decompressed_length = read_uncompressed_size(input); ``` In the absence of a length check, this can lead to a panic when reading the prefix from a short buffer. This creates a mismatch between: * Format expectation: buffers include an 8-byte prefix * Runtime behavior: no validation that this invariant holds --- ## Comparison Other components in this repository (e.g., Parquet) perform a much more careful validation before parsing compressed data, including: * checking input length before reading fixed-size prefix fields * validating compressed and uncompressed sizes before decompression This ensures malformed input is handled with errors rather than panics. --- ## Expected Behavior The IPC decompression path should validate that the input buffer contains at least 8 bytes before reading the prefix, and return an `ArrowError` otherwise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
