[ 
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688725#action_12688725
 ] 

Joydeep Sen Sarma commented on HIVE-352:
----------------------------------------

it's not clear to me that we need to ditch Sequencefile for the short term. 
Like Prasad said - we can impose our own structure on the sequencefile record 
which can allow skipping unnecessary data.

we cannot use record compression obviously. There are two approaches you can 
take:

1. keep using a BytesWritable (or Text) for the 'value' part and impose ur own 
layout inside this so that the ColumnarSerDe only needs to seek to and 
decompress the relevant column). This does require one copy of the entire data 
from sequencefile 'value' to the BytesWritable
2. use the Hadoop serializer framework (see 
src/core/org/apache/hadoop/io/serializer) - and get Hadoop to pass u the input 
stream directly (for reading the 'value' part). The custom deserializer can 
then be configured via Hive's plan to only copy out the bytes that are of 
interest to the Hive plan.

#2 is obviously more complicated - and in practice straighline data copies of 
hot data is not that expensive (since Hadoop has already done a crc check on 
all this data and it's typically already in processor caches and fast to scan 
again).

So i would try out #1 to begin with. 

> Make Hive support column based storage
> --------------------------------------
>
>                 Key: HIVE-352
>                 URL: https://issues.apache.org/jira/browse/HIVE-352
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will 
> enhance hive to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i 
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to