Thanks Adam, that works for me as well. It seems that the property for hive.exec.compress.output is case sensitive, and when it is set to TRUE (as it is on the compressed storage page on the wiki) it is ignored by hive.
-Brent On Tue, Feb 16, 2010 at 4:24 PM, Adam O'Donnell <[email protected]> wrote: > Adding these to my hive-site.xml file worked fine: > > <property> > <name>hive.exec.compress.output</name> > <value>true</value> > <description>Compress output</description> > </property> > > <property> > <name>mapred.output.compression.type</name> > <value>BLOCK</value> > <description>Block compression</description> > </property> > > > On Tue, Feb 16, 2010 at 1:43 PM, Brent Miller <[email protected]> > wrote: > > Hello, I've seen issues similar to this one come up once or twice before, > > but I haven't ever seen a solution to the problem that I'm having. I was > > following the Compressed Storage page on the Hive > > Wiki http://wiki.apache.org/hadoop/CompressedStorage and realized that > the > > sequence files that are created in the warehouse directory are actually > > uncompressed and larger than than the originals. > > For example, I have a table 'test1' who's input data looks something > like: > > 0,1369962224,2010/02/01,00:00:00.101,0C030301,4,0000BD43 > > 0,1369962225,2010/02/01,00:00:00.101,0C030501,4,66268E43 > > 0,1369962226,2010/02/01,00:00:00.101,0C030701,4,041F3341 > > ... > > And after creating a second table 'test1_comp' that was crated with the > > STORED AS SEQUENCEFILE directive and the compression options SET as > > described in the wiki, I can look at the resultant sequence files and see > > that they're just plain (uncompressed) text: > > SEQ "org.apache.hadoop.io.BytesWritable org.apache.hadoop.io.Text+�c�!Y�M > �� > > Z^��= 80,1369962224,2010/02/01,00:00:00.101,0C030301,4,0000BD43= > > 80,1369962225,2010/02/01,00:00:00.101,0C030501,4,66268E43= > > 80,1369962226,2010/02/01,00:00:00.101,0C030701,4,041F3341= > > 80,1369962227,2010/02/01,00:00:00.101,0C030901,4,11360141= > > ... > > I've tried messing around with different org.apache.hadoop.io.compress.* > > options, but the sequence files always come out uncompressed. Has anybody > > ever seen this or know away to keep the data compressed? Since the input > > text is so uniform, we get huge space savings from compression and would > > like to store the data this way if possible. I'm using Hadoop 20.1 and > Hive > > that I checked out from SVN about a week ago. > > Thanks, > > Brent > > > > -- > Adam J. O'Donnell, Ph.D. > Immunet Corporation > Cell: +1 (267) 251-0070 >
