[jira] Commented: (HIVE-352) Make Hive support column based storage

He Yongqiang (JIRA) Thu, 23 Apr 2009 15:40:52 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702146#action_12702146
 ]


He Yongqiang commented on HIVE-352:
-----------------------------------

>>Can we also get some numbers on the amount of memory usage? 
I rerun the test(the same test as Zheng's,but with no native codec) in my local 
using local fs and DefaultCodec, and it read all columns of a rc file with 80 
columns and 100000 rows(size:91849881 Bytes).
And the maximum memory usages is shown below( i do couple of command 'ps -o 
vsz,rss,rsz,%mem -p 549' every minute),
     VSZ    RSS    RSZ %MEM
  766732  63472  63472 -3.0
BTW, my physical memory is 3GB.

>>Was this just a hdfs read or the measurement of a Hive query?
The test was just a file read test.

However, with no native codec and my results shows a much diff from Zheng's in 
that SequenceFile does much worse in my test.
{noformat}
Write RCFile with 80 random string columns and 100000 rows cost 30643 
milliseconds. And the file's on disk size is 91849881
Write SequenceFile with 80 random string columns and 100000 rows cost 62034 
milliseconds. And the file's on disk size is 102521005
Read only one column of a RCFile with 80 random string columns and 100000 rows 
cost 703 milliseconds.
Read only first and last columns of a RCFile with 80 random string columns and 
100000 rows cost 526 milliseconds.
Read all columns of a RCFile with 80 random string columns and 100000 rows cost 
3131 milliseconds.
Read SequenceFile with 80  random string columns and 100000 rows cost 47876 
milliseconds.
{noformat}

Why native codec matters so much for sequece file and not for RCFile? It should 
influence both RCFile and SequenceFile in the same way.

> Make Hive support column based storage
> --------------------------------------
>
>                 Key: HIVE-352
>                 URL: https://issues.apache.org/jira/browse/HIVE-352
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: 4-22 performace2.txt, 4-22 performance.txt, 4-22 
> progress.txt, hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, 
> hive-352-2009-4-17.patch, hive-352-2009-4-19.patch, 
> hive-352-2009-4-22-2.patch, hive-352-2009-4-22.patch, 
> hive-352-2009-4-23.patch, HIve-352-draft-2009-03-28.patch, 
> Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will 
> enhance hive to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i 
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-352) Make Hive support column based storage

Reply via email to