sahvx655-wq commented on PR #64248: URL: https://github.com/apache/doris/pull/64248#issuecomment-4655931451
To map this onto the template: the problem is a heap out-of-bounds write in `HyperLogLog::deserialize`. When the blob is `HLL_DATA_SPARSE` the loop reads a uint16 register index per entry and writes `_registers[register_idx]`, but `_registers` is a fixed `HLL_REGISTERS_COUNT` (16384) array. The only guard ahead of it is `is_valid()`, which I traced through and it only checks the slice length (`5 + 3*num_registers`), never the index itself, so any index from 16384 up to 65535 sails straight through. I came at this from the column deserialisation paths that feed serialised HLL bytes back in, and a guard-page run confirmed the unpatched loop faults on the very first index of 16384 while `is_valid()` still reports the blob as well formed. No behaviour change for valid data. The fix bounds the index where the write actually happens and returns false on an out-of-range entry, which is the only layer that catches every caller since several reach `deserialize` without going through `is_valid()`. Points 3 to 5 of the template don't apply, this is a bug fix with no new feature, refactor or optimisation. Left unpatched it's an attacker-influenced OOB write of a controlled byte up to roughly 49KB past the allocation, so it sits at the corruption end rather than a benign crash, which is why I kept the diff to the single bound check plus the regression test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
