[
https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756978#action_12756978
]
Ning Zhang commented on HIVE-819:
---------------------------------
Yongqiang, thanks for the explanation! Below are some more detailed comments:
1) in RCFile.c:307 it seems decompress() can be called multiple times and the
function doesn't check if the data is already decompressed, and if so return.
This may not cause problem in this diff since the callers will check if the
data is decompressed or not before calling decompress(), but since it is a
public function and it doesn't prevent future callers call this function twice.
So it may be better to implement this check inside the decompress() function.
2) Also the same decompress() function, it seems it doesn't work correctly when
the column is not compressed. Can you double check it?
3) Add unit tests or qfiles for the following cases:
- storage dimension:
(1) fields are compressed
(2) fields are uncompressed
- queries dimension:
(a) 1 column in the where-clause
(b) 2 references to the same column in the where-clause (e.g., a> 2 and a
< 5)
(c) 2 references to the same column in the where-clause and groupby-clause
respectively (e.g., where a > 2 group by a).
So there will be 6 test cases w/ the permutation of the 2 dimensions. For (b)
and (c) please check the actual column decompression is only done once.
> Add lazy decompress ability to RCFile
> -------------------------------------
>
> Key: HIVE-819
> URL: https://issues.apache.org/jira/browse/HIVE-819
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Query Processor, Serializers/Deserializers
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Fix For: 0.5.0
>
> Attachments: hive-819-2009-9-12.patch
>
>
> This is especially useful for a filter scanning.
> For example, for query 'select a, b, c from table_rc_lazydecompress where
> a>1;' we only need to decompress the block data of b,c columns when one row's
> column 'a' in that block satisfies the filter condition.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.