Github user wgtmac commented on a diff in the pull request:
https://github.com/apache/orc/pull/245#discussion_r181168352
--- Diff: site/_docs/encodings.md ---
@@ -109,10 +109,20 @@ DIRECT_V2 | PRESENT | Yes | Boolean
RLE
Decimal was introduced in Hive 0.11 with infinite precision (the total
number of digits). In Hive 0.13, the definition was change to limit
the precision to a maximum of 38 digits, which conveniently uses 127
-bits plus a sign bit. The current encoding of decimal columns stores
-the integer representation of the value as an unbounded length zigzag
-encoded base 128 varint. The scale is stored in the SECONDARY stream
-as an signed integer.
+bits plus a sign bit.
+
+DIRECT and DIRECT_V2 encodings of decimal columns stores the integer
+representation of the value as an unbounded length zigzag encoded base
+128 varint. The scale is stored in the SECONDARY stream as an signed
+integer.
+
+In ORC 2.0, DECIMAL encoding is introduced and totally remove scale
+stream as all decimal values use the same scale. When precision is
+no greater than 18, decimal values can be fully represented by DATA
+stream which stores 64-bit signed integers. When precision is greater
+than 18, we use a 128-bit signed integer to store the decimal value.
+DATA stream stores the higher 64 bits and SECONDARY stream holds the
+lower 64 bits. Both streams use signed integer RLE v2.
--- End diff --
The main problem is that we don't have 128-bit integer RLE on hand.
---