Github user prasanthj commented on a diff in the pull request: https://github.com/apache/orc/pull/245#discussion_r180942367 --- Diff: site/_docs/encodings.md --- @@ -123,6 +127,41 @@ DIRECT_V2 | PRESENT | Yes | Boolean RLE | DATA | No | Unbounded base 128 varints | SECONDARY | No | Unsigned Integer RLE v2 +In ORC 2.0, DECIMAL and DECIMAL_V2 encodings are introduced and scale +stream is totally removed as all decimal values use the same scale. +There are two difference cases: precision<=18 and precision>18. + +### Decimal Encoding for precision <= 18 + +When precision is no greater than 18, decimal values can be fully +represented by 64-bit signed integers which are stored in DATA stream +and use signed integer RLE. + +Encoding | Stream Kind | Optional | Contents +:------------ | :-------------- | :------- | :------- +DECIMAL | PRESENT | Yes | Boolean RLE + | DATA | No | Signed Integer RLE v1 +DECIMAL_V2 | PRESENT | Yes | Boolean RLE + | DATA | No | Signed Integer RLE v2 --- End diff -- Will there be a new RLE version in ORC 2.0? Since we already have 2 RLE versions this is confusing (<0.11 version uses RLE v1 and >=0.12 uses RLE v2). If this is newer RLE version then we should probably rename it to RLE v3. If there will not be a new RLE version then we should probably not use the old RLE v1.