Adding these to my hive-site.xml file worked fine:
<property>
<name>hive.exec.compress.output</name>
<value>true</value>
<description>Compress output</description>
</property>
<property>
<name>mapred.output.compression.type</name>
<value>BLOCK</value>
<description>Block compression</description>
</property>
On Tue, Feb 16, 2010 at 1:43 PM, Brent Miller <[email protected]> wrote:
> Hello, I've seen issues similar to this one come up once or twice before,
> but I haven't ever seen a solution to the problem that I'm having. I was
> following the Compressed Storage page on the Hive
> Wiki http://wiki.apache.org/hadoop/CompressedStorage and realized that the
> sequence files that are created in the warehouse directory are actually
> uncompressed and larger than than the originals.
> For example, I have a table 'test1' who's input data looks something like:
> 0,1369962224,2010/02/01,00:00:00.101,0C030301,4,0000BD43
> 0,1369962225,2010/02/01,00:00:00.101,0C030501,4,66268E43
> 0,1369962226,2010/02/01,00:00:00.101,0C030701,4,041F3341
> ...
> And after creating a second table 'test1_comp' that was crated with the
> STORED AS SEQUENCEFILE directive and the compression options SET as
> described in the wiki, I can look at the resultant sequence files and see
> that they're just plain (uncompressed) text:
> SEQ "org.apache.hadoop.io.BytesWritable org.apache.hadoop.io.Text+�c�!Y�M ��
> Z^��= 80,1369962224,2010/02/01,00:00:00.101,0C030301,4,0000BD43=
> 80,1369962225,2010/02/01,00:00:00.101,0C030501,4,66268E43=
> 80,1369962226,2010/02/01,00:00:00.101,0C030701,4,041F3341=
> 80,1369962227,2010/02/01,00:00:00.101,0C030901,4,11360141=
> ...
> I've tried messing around with different org.apache.hadoop.io.compress.*
> options, but the sequence files always come out uncompressed. Has anybody
> ever seen this or know away to keep the data compressed? Since the input
> text is so uniform, we get huge space savings from compression and would
> like to store the data this way if possible. I'm using Hadoop 20.1 and Hive
> that I checked out from SVN about a week ago.
> Thanks,
> Brent
--
Adam J. O'Donnell, Ph.D.
Immunet Corporation
Cell: +1 (267) 251-0070