[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712916#comment-13712916 ]
Owen O'Malley commented on HIVE-4123: ------------------------------------- Comments: * merge Utils into SerializationUtils. * use the zigzag encode/decode in the the SerializationUtils.read/writeVslong * move Utils.nextLong to the test code * Utils.getTotalBytesRequired should just use long math. (n * numBits + 7) / 8 should work * Rename IntegerCompressionReader/Writer to RunLengthIntegerReader/WriterV2 * Create an interface IntegerReader that has: ** seek ** next ** skip * Make RunLengthIntegerReader and RunLengthIntegerReaderV2 implement IntegerReader * The TreeReaders should declare the fields as IntegerReader. * Each of the startStripe should use the encoding to create the right implementation of IntegerReader. * We should do the same with an IntegerWriter interface. * Replace fixedBitSizes with static methods in SerializationUtils: ** static int encodeBitWidth(int n) ** static int decodeBitWidth(int n) * Finding the percentiles seems expensive, we should look at an alternative * Why is the delta blob zigzag encoded? The sign should always be positive or negative for the entire run. * Maybe we could create an enum in the Writer that is the version to write that would look like enum OrcVersion { V0_11, V0_12 } and the StreamFactory could provide the version to the TreeWriters. > The RLE encoding for ORC can be improved > ---------------------------------------- > > Key: HIVE-4123 > URL: https://issues.apache.org/jira/browse/HIVE-4123 > Project: Hive > Issue Type: New Feature > Components: File Formats > Reporter: Owen O'Malley > Assignee: Prasanth J > Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, > ORC-Compression-Ratio-Comparison.xlsx > > > The run length encoding of integers can be improved: > * tighter bit packing > * allow delta encoding > * allow longer runs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira