Charles Pritchard created ORC-41:
------------------------------------

             Summary: Using referenced columns for improved compression
                 Key: ORC-41
                 URL: https://issues.apache.org/jira/browse/ORC-41
             Project: Orc
          Issue Type: Improvement
            Reporter: Charles Pritchard


Many data sets I work with have one column which essentially references 
another, with one column being a bigint and one column being a string. It is 
always a case that the value of the integer field determines the value of the 
string field.

I also work with data sets where one bigint field is always going to determine 
the value of another bigint field, likely in a tree.

There is an opportunity to achieve better compression by identifying these use 
cases and adding in additional logic for such cross-column/dictionary lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to