Re: Help with Compressed Storage

Zheng Shao Wed, 17 Feb 2010 13:35:03 -0800

I just corrected the wiki page. It will also be a good idea to support
case-insensitive boolean values in the code.


Zheng

On Wed, Feb 17, 2010 at 9:27 AM, Brent Miller <[email protected]> wrote:
> Thanks Adam, that works for me as well.
> It seems that the property for hive.exec.compress.output is case sensitive,
> and when it is set to TRUE (as it is on the compressed storage page on the
> wiki) it is ignored by hive.
>
> -Brent
>
> On Tue, Feb 16, 2010 at 4:24 PM, Adam O'Donnell <[email protected]> wrote:
>>
>> Adding these to my hive-site.xml file worked fine:
>>
>>  <property>
>>        <name>hive.exec.compress.output</name>
>>        <value>true</value>
>>        <description>Compress output</description>
>>  </property>
>>
>>  <property>
>>        <name>mapred.output.compression.type</name>
>>        <value>BLOCK</value>
>>        <description>Block compression</description>
>>  </property>
>>
>>
>> On Tue, Feb 16, 2010 at 1:43 PM, Brent Miller <[email protected]>
>> wrote:
>> > Hello, I've seen issues similar to this one come up once or twice
>> > before,
>> > but I haven't ever seen a solution to the problem that I'm having. I was
>> > following the Compressed Storage page on the Hive
>> > Wiki http://wiki.apache.org/hadoop/CompressedStorage and realized that
>> > the
>> > sequence files that are created in the warehouse directory are actually
>> > uncompressed and larger than than the originals.
>> > For example, I have a table 'test1' who's input data looks something
>> > like:
>> > 0,1369962224,2010/02/01,00:00:00.101,0C030301,4,0000BD43
>> > 0,1369962225,2010/02/01,00:00:00.101,0C030501,4,66268E43
>> > 0,1369962226,2010/02/01,00:00:00.101,0C030701,4,041F3341
>> > ...
>> > And after creating a second table 'test1_comp' that was crated with the
>> > STORED AS SEQUENCEFILE directive and the compression options SET as
>> > described in the wiki, I can look at the resultant sequence files and
>> > see
>> > that they're just plain (uncompressed) text:
>> > SEQ "org.apache.hadoop.io.BytesWritable
>> > org.apache.hadoop.io.Text+�c�!Y�M ��
>> > Z^��= 80,1369962224,2010/02/01,00:00:00.101,0C030301,4,0000BD43=
>> > 80,1369962225,2010/02/01,00:00:00.101,0C030501,4,66268E43=
>> > 80,1369962226,2010/02/01,00:00:00.101,0C030701,4,041F3341=
>> > 80,1369962227,2010/02/01,00:00:00.101,0C030901,4,11360141=
>> > ...
>> > I've tried messing around with different org.apache.hadoop.io.compress.*
>> > options, but the sequence files always come out uncompressed. Has
>> > anybody
>> > ever seen this or know away to keep the data compressed? Since the input
>> > text is so uniform, we get huge space savings from compression and would
>> > like to store the data this way if possible. I'm using Hadoop 20.1 and
>> > Hive
>> > that I checked out from SVN about a week ago.
>> > Thanks,
>> > Brent
>>
>>
>>
>> --
>> Adam J. O'Donnell, Ph.D.
>> Immunet Corporation
>> Cell: +1 (267) 251-0070
>
>



-- 
Yours,
Zheng

Re: Help with Compressed Storage

Reply via email to