Github user omalley commented on a diff in the pull request:

    https://github.com/apache/orc/pull/245#discussion_r181525137
  
    --- Diff: site/_docs/encodings.md ---
    @@ -109,10 +109,20 @@ DIRECT_V2     | PRESENT         | Yes      | Boolean 
RLE
     Decimal was introduced in Hive 0.11 with infinite precision (the total
     number of digits). In Hive 0.13, the definition was change to limit
     the precision to a maximum of 38 digits, which conveniently uses 127
    -bits plus a sign bit. The current encoding of decimal columns stores
    -the integer representation of the value as an unbounded length zigzag
    -encoded base 128 varint. The scale is stored in the SECONDARY stream
    -as an signed integer.
    +bits plus a sign bit.
    +
    +DIRECT and DIRECT_V2 encodings of decimal columns stores the integer
    +representation of the value as an unbounded length zigzag encoded base
    +128 varint. The scale is stored in the SECONDARY stream as an signed
    +integer.
    +
    +In ORC 2.0, DECIMAL_V1 and DECIMAL_V2 encodins are introduced and
    --- End diff --
    
    In ORCv2, we'll just pick a RLE and not leave it pickable.
    
    In terms of the encoding names, I'm a bit torn. My original inclination 
would be to use DECIMAL64 and DECIMAL128 as encoding names. However, It would 
be nice to have the ability to use dictionaries, so we'd need dictionary forms 
of them too. Thoughts?


---

Reply via email to