Bankim Bhavsar created KUDU-2968:
------------------------------------
Summary: RleDecoder::GetNextRun() may attempt decoding past the
last byte leading to assertion failure
Key: KUDU-2968
URL: https://issues.apache.org/jira/browse/KUDU-2968
Project: Kudu
Issue Type: Bug
Components: util
Reporter: Bankim Bhavsar
Assignee: Bankim Bhavsar
RLE encoding may encode "literally" when it doesn't find sufficient repeated
values.
SeeĀ
[https://github.com/apache/kudu/blob/master/src/kudu/util/rle-encoding.h#L28]
Consider a scenarios where consecutive (non-repeated) integers are encoded
using RLE encoding. In that case values are encoded in literal fashion. Literal
count is encoded and it's a multiple of 8.
When the number of values are not multiple of 8, literal count is rounded up to
multiple of 8.
For e.g. if number of values is 100, then literal_count is 104 but max_bytes is
correctly set at 100 for int8 datatype.
In this scenario after reading the last value when {{ret}} is 0, literal_count
still remains at 4.
Hence the next {{GetValue}} return false since it's trying to read beyond
{{max_bytes}}.
https://github.com/apache/kudu/blob/master/src/kudu/util/rle-encoding.h#L319
{code}
DCHECK(literal_count_ > 0);
if (ret == 0) {
bool has_more = bit_reader_.GetValue(bit_width_, val);
DCHECK(has_more);
literal_count_--;
ret++;
rem--;
}
while (literal_count_ > 0) {
bool result = bit_reader_.GetValue(bit_width_, ¤t_value_);
DCHECK(result);
if (current_value_ != *val || rem == 0) {
bit_reader_.Rewind(bit_width_);
return ret;
}
ret++;
rem--;
literal_count_--;
}
}
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)