ffacs opened a new pull request, #2622:
URL: https://github.com/apache/orc/pull/2622
### What changes were proposed in this pull request?
This PR validates string lengths decoded by the C++ direct string reader.
`StringDirectColumnReader` now rejects negative string lengths and detects
`size_t` overflow while accumulating
total string data size. The skip path also checks for overflow when
accumulating sizes across chunks.
Regression tests were added for negative lengths and length-sum overflow
in direct string encoding.
### Why are the changes needed?
Malformed ORC files can provide invalid values in the LENGTH stream for
direct-encoded string columns. Negative lengths or overflowing length sums can
cause the reader to allocate an
incorrectly sized buffer and then copy more data than the allocation can
hold.
The new validation makes malformed input fail cleanly with `ParseError`.
### How was this patch tested?
Ran:
```bash
cmake --build build --target orc-test -j 8
build/c++/test/orc-test
'--gtest_filter=TestColumnReader.testStringDirectRejects*'
build/c++/test/orc-test
'--gtest_filter=TestColumnReader.testStringDirect*:OrcColumnReaderTest/TestColumnReaderEncoded.testStringDirect*'
```
All selected tests passed.
### Was this patch authored or co-authored using generative AI tooling?
Yes. Generated with OpenAI Codex.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]