github-actions[bot] commented on code in PR #64317:
URL: https://github.com/apache/doris/pull/64317#discussion_r3381670423
##########
be/src/format/parquet/decoder.h:
##########
@@ -152,6 +152,20 @@ class BaseDictDecoder : public Decoder {
return Status::OK();
}
+ // The index bit width is read from the data page and is fully attacker
controlled,
+ // so a decoded index may point past the dictionary. Reject it before it
is used to
+ // look up _dict_items.
+ Status _check_dict_indexes(size_t dict_size) {
Review Comment:
This still leaves the untrusted bit width unchecked before
`_check_dict_indexes()` runs. `BaseDictDecoder::set_data()` reads the first
data-page byte and constructs `RleBatchDecoder<uint32_t>` with it; then both
callers invoke `GetBatch()` before this helper. For a crafted page with
`bit_width > 32` and a repeated run, `RleBatchDecoder::NextCounts()` calls
`BatchedBitReader::GetBytes<uint32_t>(BitUtil::Ceil(bit_width, 8),
&repeated_value_)`. The `num_bytes <= sizeof(T)` guard there is only a
`DCHECK`, so release builds can `memcpy` 5+ bytes into a 4-byte `uint32_t`
before the new bounds check is reached. Literal runs with widths above 32 can
also get truncated or fail with zero-filled `_indexes`, so this does not
reliably reject the malformed index stream.
Please validate the dictionary index bit width in
`BaseDictDecoder::set_data()` before constructing/using the RLE decoder, for
example reject empty page data and any width greater than `sizeof(uint32_t) *
CHAR_BIT`, and add decoder-level negative tests in the existing byte-array and
fixed-length dict decoder tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]