Douglas Drinka created ORC-143:
----------------------------------
Summary: DELTA encoding may exaggerate number of bits required
Key: ORC-143
URL: https://issues.apache.org/jira/browse/ORC-143
Project: Orc
Issue Type: Bug
Components: Java
Affects Versions: 1.4.0
Reporter: Douglas Drinka
Priority: Minor
Consider the following code:
{code:title=RunLengthIntegerWriterV2.java,
determineEncoding()|borderStyle=solid}
this.min = literals[0];
long max = literals[0];
final long initialDelta = literals[1] - literals[0];
long currDelta = initialDelta;
long deltaMax = initialDelta;
this.adjDeltas[0] = initialDelta;
{code}
Given the following sequence of longs: {0, 10000, 10001, 10002, 10003, 10004,
10005} {{deltaMax}} would be 10000. {{deltaMax}} is used to determine the bit
width of the encoded delta array, but the bit-packed output doesn't include the
first delta--rather, it's encoded in Delta Base as a varint.
I believe {{deltaMax}} should be set to 0 initially, allowing the later check
for {{(i > 1)}} to ignore the first delta correctly.
Sorry for no pull request with a regression test case. I'm not set up for java
development here. It may also be that I'm reading this wrong.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)