sahvx655-wq commented on PR #64248:
URL: https://github.com/apache/doris/pull/64248#issuecomment-4655931451

   To map this onto the template: the problem is a heap out-of-bounds write in 
`HyperLogLog::deserialize`. When the blob is `HLL_DATA_SPARSE` the loop reads a 
uint16 register index per entry and writes `_registers[register_idx]`, but 
`_registers` is a fixed `HLL_REGISTERS_COUNT` (16384) array. The only guard 
ahead of it is `is_valid()`, which I traced through and it only checks the 
slice length (`5 + 3*num_registers`), never the index itself, so any index from 
16384 up to 65535 sails straight through. I came at this from the column 
deserialisation paths that feed serialised HLL bytes back in, and a guard-page 
run confirmed the unpatched loop faults on the very first index of 16384 while 
`is_valid()` still reports the blob as well formed.
   
   No behaviour change for valid data. The fix bounds the index where the write 
actually happens and returns false on an out-of-range entry, which is the only 
layer that catches every caller since several reach `deserialize` without going 
through `is_valid()`. Points 3 to 5 of the template don't apply, this is a bug 
fix with no new feature, refactor or optimisation. Left unpatched it's an 
attacker-influenced OOB write of a controlled byte up to roughly 49KB past the 
allocation, so it sits at the corruption end rather than a benign crash, which 
is why I kept the diff to the single bound check plus the regression test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to