I just corrected the wiki page. It will also be a good idea to support case-insensitive boolean values in the code.
Zheng On Wed, Feb 17, 2010 at 9:27 AM, Brent Miller <[email protected]> wrote: > Thanks Adam, that works for me as well. > It seems that the property for hive.exec.compress.output is case sensitive, > and when it is set to TRUE (as it is on the compressed storage page on the > wiki) it is ignored by hive. > > -Brent > > On Tue, Feb 16, 2010 at 4:24 PM, Adam O'Donnell <[email protected]> wrote: >> >> Adding these to my hive-site.xml file worked fine: >> >> <property> >> <name>hive.exec.compress.output</name> >> <value>true</value> >> <description>Compress output</description> >> </property> >> >> <property> >> <name>mapred.output.compression.type</name> >> <value>BLOCK</value> >> <description>Block compression</description> >> </property> >> >> >> On Tue, Feb 16, 2010 at 1:43 PM, Brent Miller <[email protected]> >> wrote: >> > Hello, I've seen issues similar to this one come up once or twice >> > before, >> > but I haven't ever seen a solution to the problem that I'm having. I was >> > following the Compressed Storage page on the Hive >> > Wiki http://wiki.apache.org/hadoop/CompressedStorage and realized that >> > the >> > sequence files that are created in the warehouse directory are actually >> > uncompressed and larger than than the originals. >> > For example, I have a table 'test1' who's input data looks something >> > like: >> > 0,1369962224,2010/02/01,00:00:00.101,0C030301,4,0000BD43 >> > 0,1369962225,2010/02/01,00:00:00.101,0C030501,4,66268E43 >> > 0,1369962226,2010/02/01,00:00:00.101,0C030701,4,041F3341 >> > ... >> > And after creating a second table 'test1_comp' that was crated with the >> > STORED AS SEQUENCEFILE directive and the compression options SET as >> > described in the wiki, I can look at the resultant sequence files and >> > see >> > that they're just plain (uncompressed) text: >> > SEQ "org.apache.hadoop.io.BytesWritable >> > org.apache.hadoop.io.Text+�c�!Y�M �� >> > Z^��= 80,1369962224,2010/02/01,00:00:00.101,0C030301,4,0000BD43= >> > 80,1369962225,2010/02/01,00:00:00.101,0C030501,4,66268E43= >> > 80,1369962226,2010/02/01,00:00:00.101,0C030701,4,041F3341= >> > 80,1369962227,2010/02/01,00:00:00.101,0C030901,4,11360141= >> > ... >> > I've tried messing around with different org.apache.hadoop.io.compress.* >> > options, but the sequence files always come out uncompressed. Has >> > anybody >> > ever seen this or know away to keep the data compressed? Since the input >> > text is so uniform, we get huge space savings from compression and would >> > like to store the data this way if possible. I'm using Hadoop 20.1 and >> > Hive >> > that I checked out from SVN about a week ago. >> > Thanks, >> > Brent >> >> >> >> -- >> Adam J. O'Donnell, Ph.D. >> Immunet Corporation >> Cell: +1 (267) 251-0070 > > -- Yours, Zheng
