mrhhsg opened a new pull request, #63617:
URL: https://github.com/apache/doris/pull/63617
### What problem does this PR solve?
Issue Number: None
Problem Summary: Dictionary-encoded string pages use codewords from the data
page to index the page dictionary during decoding. Corrupted codeword data can
reference entries outside the dictionary and lead to out-of-bounds dictionary
reads. This PR validates dictionary codewords against the dictionary size
before materializing string columns, predicate columns, dictionary columns, and
offset-only length reads. Invalid codewords now fail with a corruption status
instead of indexing outside the dictionary.
This PR also updates a stale column array view unit test helper usage
required by the current master build after rebasing.
### Release note
None
### Check List (For Author)
- Test:
- Unit Test: ./run-be-ut.sh --run
--filter=ColumnStringTest.insert_many_dict_data*:PredicateColumnTest.InsertManyDictData*:ColumnDictionaryTest.insert_many_dict_data*:BinaryDictPageTest.TestRejectInvalidDictCodeword*
-j 16
- Manual test: build-support/clang-format.sh
- Manual test: build-support/check-format.sh
- Manual test: git diff --check
- Behavior changed: Yes. Corrupted dictionary-encoded string pages with
invalid codewords are rejected as corruption instead of being decoded.
- Does this need documentation: No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]