[GitHub] orc pull request #245: ORC-161: Proposal for new decimal encodings and stati...

wgtmac Wed, 11 Apr 2018 21:16:32 -0700

Github user wgtmac commented on a diff in the pull request:

    https://github.com/apache/orc/pull/245#discussion_r180961097
  
    --- Diff: site/_docs/encodings.md ---
    @@ -123,6 +127,41 @@ DIRECT_V2     | PRESENT         | Yes      | Boolean 
RLE
                   | DATA            | No       | Unbounded base 128 varints
                   | SECONDARY       | No       | Unsigned Integer RLE v2
     
    +In ORC 2.0, DECIMAL and DECIMAL_V2 encodings are introduced and scale
    +stream is totally removed as all decimal values use the same scale.
    +There are two difference cases: precision<=18 and precision>18.
    +
    +### Decimal Encoding for precision <= 18
    +
    +When precision is no greater than 18, decimal values can be fully
    +represented by 64-bit signed integers which are stored in DATA stream
    +and use signed integer RLE.
    +
    +Encoding      | Stream Kind     | Optional | Contents
    +:------------ | :-------------- | :------- | :-------
    +DECIMAL       | PRESENT         | Yes      | Boolean RLE
    +              | DATA            | No       | Signed Integer RLE v1
    +DECIMAL_V2    | PRESENT         | Yes      | Boolean RLE
    +              | DATA            | No       | Signed Integer RLE v2
    +
    +### Decimal Encoding for precision > 18
    +
    +When precision is greater than 18, decimal value is split into two
    +parts: a signed integer stores higher 64 bits and an unsigned integer
    +stores lower 64 bits. Therefore, a DATA stream is utilized to store
    +the higher 64-bit signed integer of decimal values and a SECONDARY
    +stream holds the lower 64-bit unsigned integer of decimal values.
    +Both streams use RLE and are not optional in this case.
    +
    +Encoding      | Stream Kind     | Optional | Contents
    +:------------ | :-------------- | :------- | :-------
    +DECIMAL       | PRESENT         | Yes      | Boolean RLE
    +              | DATA            | No       | Signed Integer RLE v1
    +              | SECONDARY       | No       | Unsigned Integer RLE v1
    +DECIMAL_V2    | PRESENT         | Yes      | Boolean RLE
    +              | DATA            | No       | Signed Integer RLE v1
    +              | SECONDARY       | No       | Unsigned Integer RLE v2
    --- End diff --
    
    This would be hacky since we use int64_t and uint64_t to represent Int128 
in C++. I can force to use signed integer RLE for uint64_t integers. Not sure 
if java can do the same thing.

---

[GitHub] orc pull request #245: ORC-161: Proposal for new decimal encodings and stati...

Reply via email to