Github user omalley commented on a diff in the pull request: https://github.com/apache/orc/pull/245#discussion_r181525137 --- Diff: site/_docs/encodings.md --- @@ -109,10 +109,20 @@ DIRECT_V2 | PRESENT | Yes | Boolean RLE Decimal was introduced in Hive 0.11 with infinite precision (the total number of digits). In Hive 0.13, the definition was change to limit the precision to a maximum of 38 digits, which conveniently uses 127 -bits plus a sign bit. The current encoding of decimal columns stores -the integer representation of the value as an unbounded length zigzag -encoded base 128 varint. The scale is stored in the SECONDARY stream -as an signed integer. +bits plus a sign bit. + +DIRECT and DIRECT_V2 encodings of decimal columns stores the integer +representation of the value as an unbounded length zigzag encoded base +128 varint. The scale is stored in the SECONDARY stream as an signed +integer. + +In ORC 2.0, DECIMAL_V1 and DECIMAL_V2 encodins are introduced and --- End diff -- In ORCv2, we'll just pick a RLE and not leave it pickable. In terms of the encoding names, I'm a bit torn. My original inclination would be to use DECIMAL64 and DECIMAL128 as encoding names. However, It would be nice to have the ability to use dictionaries, so we'd need dictionary forms of them too. Thoughts?
---