Charles Pritchard created ORC-41:
------------------------------------
Summary: Using referenced columns for improved compression
Key: ORC-41
URL: https://issues.apache.org/jira/browse/ORC-41
Project: Orc
Issue Type: Improvement
Reporter: Charles Pritchard
Many data sets I work with have one column which essentially references
another, with one column being a bigint and one column being a string. It is
always a case that the value of the integer field determines the value of the
string field.
I also work with data sets where one bigint field is always going to determine
the value of another bigint field, likely in a tree.
There is an opportunity to achieve better compression by identifying these use
cases and adding in additional logic for such cross-column/dictionary lookups.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)