[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717596#comment-13717596 ]
Owen O'Malley commented on HIVE-4123: ------------------------------------- {quote} 1) In the current implementation, I kept the delta base field as optional (used only for fixed delta runs) and zigzag encoded the delta blob so that we don't have to deal with sign of the deltas. I can change delta base field to mandatory field to store the base (absolute min) value of delta values and zigzag encode it. With base value and delta base value, we should be able to identify if the sequence is monotonically increasing or decreasing and also we can identify the sign of the delta values. I hope this is what you are looking for. Please correct me if my understanding is wrong. {quote} I think it will be worthwhile always having the delta base and keeping the additional delta as an unsigned remainder. {quote} 2) is there any way we can reuse the Orc's MAJOR and MINOR version as supported in HIVE-4724 to figure out if we need use new integer encoding or old integer encoding? {quote} Yeah, I need to add more framework for that code. I'm leaning toward passing in a factory object that creates the right integer encoder. > The RLE encoding for ORC can be improved > ---------------------------------------- > > Key: HIVE-4123 > URL: https://issues.apache.org/jira/browse/HIVE-4123 > Project: Hive > Issue Type: New Feature > Components: File Formats > Reporter: Owen O'Malley > Assignee: Prasanth J > Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, > ORC-Compression-Ratio-Comparison.xlsx > > > The run length encoding of integers can be improved: > * tighter bit packing > * allow delta encoding > * allow longer runs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira