Douglas Drinka created ORC-143:
----------------------------------

             Summary: DELTA encoding may exaggerate number of bits required
                 Key: ORC-143
                 URL: https://issues.apache.org/jira/browse/ORC-143
             Project: Orc
          Issue Type: Bug
          Components: Java
    Affects Versions: 1.4.0
            Reporter: Douglas Drinka
            Priority: Minor


Consider the following code:
{code:title=RunLengthIntegerWriterV2.java, 
determineEncoding()|borderStyle=solid}
    this.min = literals[0];
    long max = literals[0];
    final long initialDelta = literals[1] - literals[0];
    long currDelta = initialDelta;
    long deltaMax = initialDelta;
    this.adjDeltas[0] = initialDelta;
{code}

Given the following sequence of longs: {0, 10000, 10001, 10002, 10003, 10004, 
10005} {{deltaMax}} would be 10000.  {{deltaMax}} is used to determine the bit 
width of the encoded delta array, but the bit-packed output doesn't include the 
first delta--rather, it's encoded in Delta Base as a varint.

I believe {{deltaMax}} should be set to 0 initially, allowing the later check 
for {{(i > 1)}} to ignore the first delta correctly.

Sorry for no pull request with a regression test case.  I'm not set up for java 
development here.  It may also be that I'm reading this wrong.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to