Github user wgtmac commented on a diff in the pull request:
https://github.com/apache/orc/pull/245#discussion_r180961097
--- Diff: site/_docs/encodings.md ---
@@ -123,6 +127,41 @@ DIRECT_V2 | PRESENT | Yes | Boolean
RLE
| DATA | No | Unbounded base 128 varints
| SECONDARY | No | Unsigned Integer RLE v2
+In ORC 2.0, DECIMAL and DECIMAL_V2 encodings are introduced and scale
+stream is totally removed as all decimal values use the same scale.
+There are two difference cases: precision<=18 and precision>18.
+
+### Decimal Encoding for precision <= 18
+
+When precision is no greater than 18, decimal values can be fully
+represented by 64-bit signed integers which are stored in DATA stream
+and use signed integer RLE.
+
+Encoding | Stream Kind | Optional | Contents
+:------------ | :-------------- | :------- | :-------
+DECIMAL | PRESENT | Yes | Boolean RLE
+ | DATA | No | Signed Integer RLE v1
+DECIMAL_V2 | PRESENT | Yes | Boolean RLE
+ | DATA | No | Signed Integer RLE v2
+
+### Decimal Encoding for precision > 18
+
+When precision is greater than 18, decimal value is split into two
+parts: a signed integer stores higher 64 bits and an unsigned integer
+stores lower 64 bits. Therefore, a DATA stream is utilized to store
+the higher 64-bit signed integer of decimal values and a SECONDARY
+stream holds the lower 64-bit unsigned integer of decimal values.
+Both streams use RLE and are not optional in this case.
+
+Encoding | Stream Kind | Optional | Contents
+:------------ | :-------------- | :------- | :-------
+DECIMAL | PRESENT | Yes | Boolean RLE
+ | DATA | No | Signed Integer RLE v1
+ | SECONDARY | No | Unsigned Integer RLE v1
+DECIMAL_V2 | PRESENT | Yes | Boolean RLE
+ | DATA | No | Signed Integer RLE v1
+ | SECONDARY | No | Unsigned Integer RLE v2
--- End diff --
This would be hacky since we use int64_t and uint64_t to represent Int128
in C++. I can force to use signed integer RLE for uint64_t integers. Not sure
if java can do the same thing.
---