[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731580#comment-13731580 ]
Prasanth J commented on HIVE-4123: ---------------------------------- Following fixes were added to this patch - Removed FIXMEs - For determining the type of integer encoding (DIRECT/DIRECT_V2) used by dictionaries, a new encoding type DICTIONARY_V2 is added. DICTIONARY_V2 uses DIRECT_V2 encoding for dictionary data and length streams. In earlier patch, there is no way to determined if dictionaries used DIRECT or DIRECT_V2 encoding. This patch addresses this issue. I am not sure if there is any other way to determine this without adding new encoding type. - addressed code review comment related to having if/then/else in flush() method of RunLengthIntegerWriterV2 > The RLE encoding for ORC can be improved > ---------------------------------------- > > Key: HIVE-4123 > URL: https://issues.apache.org/jira/browse/HIVE-4123 > Project: Hive > Issue Type: New Feature > Components: File Formats > Affects Versions: 0.12.0 > Reporter: Owen O'Malley > Assignee: Prasanth J > Labels: orcfile > Fix For: 0.12.0 > > Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, > HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, > ORC-Compression-Ratio-Comparison.xlsx > > > The run length encoding of integers can be improved: > * tighter bit packing > * allow delta encoding > * allow longer runs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira