[PR] ORC-2188: Reject invalid compression block size in PostScript [orc]

via GitHub Sun, 05 Jul 2026 01:30:03 -0700


dongjoon-hyun opened a new pull request, #2662:
URL: https://github.com/apache/orc/pull/2662


   ### What changes were proposed in this pull request?
   
   This PR aims to reject invalid PostScript `compressionBlockSize` values at 
file open in both C++ and Java readers. If the field is present and the value 
is `0` or `>= 2^23 (8MB)`, the readers now throw `ParseError` (C++) / 
`FileFormatException` (Java) with the message `Invalid compression block size: 
N`. A missing field keeps the existing 256KB default.
   
   - **C++**: check added in `getCompressionBlockSize` (`Reader.cc`), which 
covers all open paths.
   - **Java**: new `ReaderImpl.checkCompressionBlockSize` helper, called from 
`extractPostScript`, the `OrcTail` constructor branch, and the deprecated 
`extractFileTail(ByteBuffer)`.
   
   ### Why are the changes needed?
   
   Both writers already enforce `compressionBlockSize < 2^23` because the 
compressed chunk header stores the length in 23 bits, but the readers used the 
value without validation. Applying the same bound at read time fails fast 
without affecting any legitimate file.
   
   ### How was this patch tested?
   
   Pass the CIs with newly added test cases.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Fable 5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] ORC-2188: Reject invalid compression block size in PostScript [orc]

Reply via email to