[jira] Commented: (HIVE-819) Add lazy decompress ability to RCFile

Ning Zhang (JIRA) Thu, 17 Sep 2009 11:22:20 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756674#action_12756674
 ]


Ning Zhang commented on HIVE-819:
---------------------------------

A few general comments: 

 1) Can you briefly summarize the current approach of how decompression is done 
and the your proposal to the lazy decompression? Also more comments in the code 
would be much helpful.

 2) Does the performance regression by 4 secs with the query predicate duration 
> 8 consistent or intermittent? If it is the former is there any additional 
changes that causes this regression (I thought the worst case would be 
decompress all columns, as you mentioned, which is equivalent to the previous 
behavior?). If the latter, what method of timing are you using? If you have 
YourKit can your also do CPU profiling? 

> Add lazy decompress ability to RCFile
> -------------------------------------
>
>                 Key: HIVE-819
>                 URL: https://issues.apache.org/jira/browse/HIVE-819
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>             Fix For: 0.5.0
>
>         Attachments: hive-819-2009-9-12.patch
>
>
> This is especially useful for a filter scanning. 
> For example, for query 'select a, b, c from table_rc_lazydecompress where 
> a>1;' we only need to decompress the block data of b,c columns when one row's 
> column 'a' in that block satisfies the filter condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-819) Add lazy decompress ability to RCFile

Reply via email to