[
https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756674#action_12756674
]
Ning Zhang commented on HIVE-819:
---------------------------------
A few general comments:
1) Can you briefly summarize the current approach of how decompression is done
and the your proposal to the lazy decompression? Also more comments in the code
would be much helpful.
2) Does the performance regression by 4 secs with the query predicate duration
> 8 consistent or intermittent? If it is the former is there any additional
changes that causes this regression (I thought the worst case would be
decompress all columns, as you mentioned, which is equivalent to the previous
behavior?). If the latter, what method of timing are you using? If you have
YourKit can your also do CPU profiling?
> Add lazy decompress ability to RCFile
> -------------------------------------
>
> Key: HIVE-819
> URL: https://issues.apache.org/jira/browse/HIVE-819
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Query Processor, Serializers/Deserializers
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Fix For: 0.5.0
>
> Attachments: hive-819-2009-9-12.patch
>
>
> This is especially useful for a filter scanning.
> For example, for query 'select a, b, c from table_rc_lazydecompress where
> a>1;' we only need to decompress the block data of b,c columns when one row's
> column 'a' in that block satisfies the filter condition.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.