sahvx655-wq opened a new pull request, #64248: URL: https://github.com/apache/doris/pull/64248
Reading an HLL_DATA_SPARSE blob in HyperLogLog::deserialize, the loop pulls a uint16 register index and an 8-bit value per entry and writes straight into _registers, a 16384-byte array sized to HLL_REGISTERS_COUNT. The only guard ahead of it is is_valid(), which validates the slice length (5 + 3*num_registers) but never inspects the index value, so any index from 16384 up to 65535 passes through. I traced this back from the column deserialisation paths that feed serialised HLL bytes in again, and a crafted sparse entry lands the write up to roughly 49KB beyond the allocation, with both the offset and the byte under the caller's control. The index has to be bounded where the write happens. is_valid() is deliberately O(1) and several callers reach deserialize without going through it, so moving the check there would miss them; rejecting an out-of-range index inside deserialize closes the write on every path. A guard-page run confirmed the unpatched code faults on the very first index of 16384 while is_valid() still reports the blob as well formed. Left alone this is a heap out-of-bounds write reachable from attacker-influenced HLL data, which sits at the corruption end of the scale rather than a benign crash. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
