[
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702146#action_12702146
]
He Yongqiang commented on HIVE-352:
-----------------------------------
>>Can we also get some numbers on the amount of memory usage?
I rerun the test(the same test as Zheng's,but with no native codec) in my local
using local fs and DefaultCodec, and it read all columns of a rc file with 80
columns and 100000 rows(size:91849881 Bytes).
And the maximum memory usages is shown below( i do couple of command 'ps -o
vsz,rss,rsz,%mem -p 549' every minute),
VSZ RSS RSZ %MEM
766732 63472 63472 -3.0
BTW, my physical memory is 3GB.
>>Was this just a hdfs read or the measurement of a Hive query?
The test was just a file read test.
However, with no native codec and my results shows a much diff from Zheng's in
that SequenceFile does much worse in my test.
{noformat}
Write RCFile with 80 random string columns and 100000 rows cost 30643
milliseconds. And the file's on disk size is 91849881
Write SequenceFile with 80 random string columns and 100000 rows cost 62034
milliseconds. And the file's on disk size is 102521005
Read only one column of a RCFile with 80 random string columns and 100000 rows
cost 703 milliseconds.
Read only first and last columns of a RCFile with 80 random string columns and
100000 rows cost 526 milliseconds.
Read all columns of a RCFile with 80 random string columns and 100000 rows cost
3131 milliseconds.
Read SequenceFile with 80 random string columns and 100000 rows cost 47876
milliseconds.
{noformat}
Why native codec matters so much for sequece file and not for RCFile? It should
influence both RCFile and SequenceFile in the same way.
> Make Hive support column based storage
> --------------------------------------
>
> Key: HIVE-352
> URL: https://issues.apache.org/jira/browse/HIVE-352
> Project: Hadoop Hive
> Issue Type: New Feature
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Attachments: 4-22 performace2.txt, 4-22 performance.txt, 4-22
> progress.txt, hive-352-2009-4-15.patch, hive-352-2009-4-16.patch,
> hive-352-2009-4-17.patch, hive-352-2009-4-19.patch,
> hive-352-2009-4-22-2.patch, hive-352-2009-4-22.patch,
> hive-352-2009-4-23.patch, HIve-352-draft-2009-03-28.patch,
> Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP.
> Hive does a great job on raw row oriented storage. In this issue, we will
> enhance hive to support column based storage.
> Acctually we have done some work on column based storage on top of hdfs, i
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.