dongjoon-hyun opened a new pull request, #2662: URL: https://github.com/apache/orc/pull/2662
### What changes were proposed in this pull request? This PR aims to reject invalid PostScript `compressionBlockSize` values at file open in both C++ and Java readers. If the field is present and the value is `0` or `>= 2^23 (8MB)`, the readers now throw `ParseError` (C++) / `FileFormatException` (Java) with the message `Invalid compression block size: N`. A missing field keeps the existing 256KB default. - **C++**: check added in `getCompressionBlockSize` (`Reader.cc`), which covers all open paths. - **Java**: new `ReaderImpl.checkCompressionBlockSize` helper, called from `extractPostScript`, the `OrcTail` constructor branch, and the deprecated `extractFileTail(ByteBuffer)`. ### Why are the changes needed? Both writers already enforce `compressionBlockSize < 2^23` because the compressed chunk header stores the length in 23 bits, but the readers used the value without validation. Applying the same bound at read time fails fast without affecting any legitimate file. ### How was this patch tested? Pass the CIs with newly added test cases. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Fable 5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
