[ 
https://issues.apache.org/jira/browse/HIVE-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160687#comment-13160687
 ] 

alex gemini commented on HIVE-2097:
-----------------------------------

selectivity play an important role in columnar database is because they use 
run-length encoding compression to compress most dimension-attribute column,for 
example,we have a log table:create table (gender,age,region,message),we know 
that the selectivity order is :gender=1/2 > age= 1/20  >1/300, we can order 
table column like #1(gender,age,region,message) or 
#2(region,age,gender,message). for #1,we only need (2 + 2*20 + 2*20*300 
+num_of_message) to store all the record in one dfs block, but if we organized 
table like #2,we will need (300 + 300*20 + 300*20*2 + num_of_message),discard 
num_of_message,the #1 is only take 66% of space #2 required,only difference is 
because run-length encoding will take more efficiently space when we organize 
table base on selectivity.
                
> Explore mechanisms for better compression with RC Files
> -------------------------------------------------------
>
>                 Key: HIVE-2097
>                 URL: https://issues.apache.org/jira/browse/HIVE-2097
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>            Priority: Minor
>
> Optimization of the compression mechanisms used by RC File to be explored.
> Some initial ideas
>  
> 1. More efficient serialization/deserialization based on type-specific and 
> storage-specific knowledge.
>  
>    For instance, storing sorted numeric values efficiently using some delta 
> coding techniques
> 2. More efficient compression based on type-specific and storage-specific 
> knowledge
>    Enable compression codecs to be specified based on types or individual 
> columns
> 3. Reordering the on-disk storage for better compression efficiency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to