[
https://issues.apache.org/jira/browse/HBASE-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628465#comment-13628465
]
Owen O'Malley commented on HBASE-8089:
--------------------------------------
Nick,
ORC gets a lot of mileage by doing type-specific compression. In particular,
the integer columns use a vint representation (protobuf vint encoding) and run
length encoding. The string columns use an adaptive dictionary (the writer
switches between dictionary or direct encoding based on the 100k initial
values) approach. That allows both tighter representation before turning on the
relatively expensive zlib or even tighter encodings when combined with zlib.
> Add type support
> ----------------
>
> Key: HBASE-8089
> URL: https://issues.apache.org/jira/browse/HBASE-8089
> Project: HBase
> Issue Type: New Feature
> Components: Client
> Reporter: Nick Dimiduk
> Attachments: HBASE-8089-types.txt, HBASE-8089-types.txt,
> HBASE-8089-types.txt
>
>
> This proposal outlines an improvement to HBase that provides for a set of
> types, above and beyond the existing "byte-bucket" strategy. This is intended
> to reduce user-level duplication of effort, provide better support for
> 3rd-party integration, and provide an overall improved experience for
> developers using HBase.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira