ffacs opened a new pull request, #2622:
URL: https://github.com/apache/orc/pull/2622

   ### What changes were proposed in this pull request?
   
     This PR validates string lengths decoded by the C++ direct string reader. 
`StringDirectColumnReader` now rejects negative string lengths and detects 
`size_t` overflow while accumulating
     total string data size. The skip path also checks for overflow when 
accumulating sizes across chunks.
   
     Regression tests were added for negative lengths and length-sum overflow 
in direct string encoding.
   
   ### Why are the changes needed?
   
     Malformed ORC files can provide invalid values in the LENGTH stream for 
direct-encoded string columns. Negative lengths or overflowing length sums can 
cause the reader to allocate an
     incorrectly sized buffer and then copy more data than the allocation can 
hold.
   
     The new validation makes malformed input fail cleanly with `ParseError`.
   
   ### How was this patch tested?
   
     Ran:
   
   ```bash
     cmake --build build --target orc-test -j 8
     build/c++/test/orc-test 
'--gtest_filter=TestColumnReader.testStringDirectRejects*'
     build/c++/test/orc-test 
'--gtest_filter=TestColumnReader.testStringDirect*:OrcColumnReaderTest/TestColumnReaderEncoded.testStringDirect*'
   ```
     All selected tests passed.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
     Yes. Generated with OpenAI Codex.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to