Bankim Bhavsar created KUDU-2968:
------------------------------------

             Summary: RleDecoder::GetNextRun() may attempt decoding past the 
last byte leading to assertion failure
                 Key: KUDU-2968
                 URL: https://issues.apache.org/jira/browse/KUDU-2968
             Project: Kudu
          Issue Type: Bug
          Components: util
            Reporter: Bankim Bhavsar
            Assignee: Bankim Bhavsar


RLE encoding may encode "literally" when it doesn't find sufficient repeated 
values.

SeeĀ 
[https://github.com/apache/kudu/blob/master/src/kudu/util/rle-encoding.h#L28]

Consider a scenarios where consecutive (non-repeated) integers are encoded 
using RLE encoding. In that case values are encoded in literal fashion. Literal 
count is encoded and it's a multiple of 8.

When the number of values are not multiple of 8, literal count is rounded up to 
multiple of 8.

For e.g. if number of values is 100, then literal_count is 104 but max_bytes is 
correctly set at 100 for int8 datatype.

In this scenario after reading the last value when {{ret}} is 0, literal_count 
still remains at 4.
Hence the next {{GetValue}} return false since it's trying to read beyond 
{{max_bytes}}.
https://github.com/apache/kudu/blob/master/src/kudu/util/rle-encoding.h#L319
{code}
      DCHECK(literal_count_ > 0);
      if (ret == 0) {
        bool has_more = bit_reader_.GetValue(bit_width_, val);
        DCHECK(has_more);
        literal_count_--;
        ret++;
        rem--;
      }

      while (literal_count_ > 0) {
        bool result = bit_reader_.GetValue(bit_width_, &current_value_);
        DCHECK(result);
        if (current_value_ != *val || rem == 0) {
          bit_reader_.Rewind(bit_width_);
          return ret;
        }
        ret++;
        rem--;
        literal_count_--;
      }
    }
  }
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to