[ 
https://issues.apache.org/jira/browse/ORC-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated ORC-703:
---------------------------------------
    Description: 
ORC has use RLE to encoding/decoding integer.
 Four types are comprised of the RLE encoding/decoding algorithm.
 Short Repeat : used for short repeating integer sequences.
 Direct : used for integer sequences whose values have a relatively constant 
bit width.
 Patched Base : used for integer sequences whose bit widths varies a lot.
 Delta : used for monotonically increasing or decreasing sequences.

This bug occurs in **Patched Base** Type for large negative number.
 In patched base, we use [3 bits to store base 
value|https://orc.apache.org/specification/ORCv2/] width that is encoded using 
1 to 8 bytes.
 If the base value is actually 8 bytes in length, the value for base width 
should be 7.
 Currently, this value can go up to 8 that can result in inconsistent data as 
part of the encoding procedure.
 In extreme cases, the ORC read process can even be cored dump referring to an 
illegal address.

  was:
ORC has use RLE to encoding/decoding integer.
 Four types are comprised of the RLE encoding/decoding algorithm.
 Short Repeat : used for short repeating integer sequences.
 Direct : used for integer sequences whose values have a relatively constant 
bit width.
 Patched Base : used for integer sequences whose bit widths varies a lot.
 Delta : used for monotonically increasing or decreasing sequences.

This bug occurs in **Patched Base** Type for large negative number.
In patched base, we use [3 bits to store base 
value|https://orc.apache.org/specification/ORCv2/] width that is encoded using 
1 to 8 bytes.
If the base value is actually 8 bytes in length, the value for base width 
should be 7.
Currently, this value can go up to 8 what can result in inconsistent data as 
part of the encoding procedure.
In extreme cases, the ORC read process can even be cored dump referring to an 
illegal address.


> RLE encoding bug on large negative integer
> ------------------------------------------
>
>                 Key: ORC-703
>                 URL: https://issues.apache.org/jira/browse/ORC-703
>             Project: ORC
>          Issue Type: Bug
>            Reporter: lichaoyong
>            Priority: Major
>
> ORC has use RLE to encoding/decoding integer.
>  Four types are comprised of the RLE encoding/decoding algorithm.
>  Short Repeat : used for short repeating integer sequences.
>  Direct : used for integer sequences whose values have a relatively constant 
> bit width.
>  Patched Base : used for integer sequences whose bit widths varies a lot.
>  Delta : used for monotonically increasing or decreasing sequences.
> This bug occurs in **Patched Base** Type for large negative number.
>  In patched base, we use [3 bits to store base 
> value|https://orc.apache.org/specification/ORCv2/] width that is encoded 
> using 1 to 8 bytes.
>  If the base value is actually 8 bytes in length, the value for base width 
> should be 7.
>  Currently, this value can go up to 8 that can result in inconsistent data as 
> part of the encoding procedure.
>  In extreme cases, the ORC read process can even be cored dump referring to 
> an illegal address.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to