[
https://issues.apache.org/jira/browse/ORC-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Panagiotis Garefalakis updated ORC-703:
---------------------------------------
Description:
ORC has use RLE to encoding/decoding integer.
Four types are comprised of the RLE encoding/decoding algorithm.
Short Repeat : used for short repeating integer sequences.
Direct : used for integer sequences whose values have a relatively constant
bit width.
Patched Base : used for integer sequences whose bit widths varies a lot.
Delta : used for monotonically increasing or decreasing sequences.
This bug occurs in **Patched Base** Type for large negative number.
In patched base, we use [3 bits to store base
value|https://orc.apache.org/specification/ORCv2/] width that is encoded using
1 to 8 bytes.
If the base value is actually 8 bytes in length, the value for base width
should be 7.
Currently, this value can go up to 8 what can result in inconsistent data as
part of the encoding procedure.
In extreme cases, the ORC read process can even be cored dump referring to an
illegal address.
was:
ORC has use RLE to encoding/decoding integer.
Four types are comprised of the RLE encoding/decoding algorithm.
Short Repeat : used for short repeating integer sequences.
Direct : used for integer sequences whose values have a relatively constant bit
width.
Patched Base : used for integer sequences whose bit widths varies a lot.
Delta : used for monotonically increasing or decreasing sequences.
This bug occurs in Patched Base Type for large negative number.
In patched base, base value is stored 1 to 8 bytes and encoding to 0 ~ 7.
If the base value is 8 byte, the encoding value for base width should be 7.
But now will encoding to 8, this is problem.
It will result in inconsistent data with loaded data because wrong encoding
procedure.
In extreme case, the process will be cored dump because illegal address.
> RLE encoding bug on large negative integer
> ------------------------------------------
>
> Key: ORC-703
> URL: https://issues.apache.org/jira/browse/ORC-703
> Project: ORC
> Issue Type: Bug
> Reporter: lichaoyong
> Priority: Major
>
> ORC has use RLE to encoding/decoding integer.
> Four types are comprised of the RLE encoding/decoding algorithm.
> Short Repeat : used for short repeating integer sequences.
> Direct : used for integer sequences whose values have a relatively constant
> bit width.
> Patched Base : used for integer sequences whose bit widths varies a lot.
> Delta : used for monotonically increasing or decreasing sequences.
> This bug occurs in **Patched Base** Type for large negative number.
> In patched base, we use [3 bits to store base
> value|https://orc.apache.org/specification/ORCv2/] width that is encoded
> using 1 to 8 bytes.
> If the base value is actually 8 bytes in length, the value for base width
> should be 7.
> Currently, this value can go up to 8 what can result in inconsistent data as
> part of the encoding procedure.
> In extreme cases, the ORC read process can even be cored dump referring to an
> illegal address.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)