hive.exec.compress.output controls whether or not to compress hive output. (This overrides mapred.output.compress in Hive).
All other compression flags are from hadoop. Please see http://hadoop.apache.org/common/docs/r0.18.0/hadoop-default.html Zheng On Fri, Feb 19, 2010 at 5:53 AM, Saurabh Nanda <[email protected]> wrote: > And also hive.exec.compress.*. So that makes it three sets of configuration > variables: > > mapred.output.compress.* > io.seqfile.compress.* > hive.exec.compress.* > > What's the relationship between these configuration parameters and which > ones should I set to achieve a well compress output table? > > Saurabh. > > On Fri, Feb 19, 2010 at 7:16 PM, Saurabh Nanda <[email protected]> > wrote: >> >> I'm confused here Zheng. There are two sets of configuration variables. >> Those starting with io.* and those starting with mapred.*. For making sure >> that the final output table is compressed, which ones do I have to set? >> >> Saurabh. >> >> On Fri, Feb 19, 2010 at 12:37 AM, Zheng Shao <[email protected]> wrote: >>> >>> Did you also: >>> >>> SET mapred.output.compression.codec=org.apache....GZipCode; >>> >>> Zheng >>> >>> On Thu, Feb 18, 2010 at 8:25 AM, Saurabh Nanda <[email protected]> >>> wrote: >>> > Hi Zheng, >>> > >>> > I cross checked. I am setting the following in my Hive script before >>> > the >>> > INSERT command: >>> > >>> > SET io.seqfile.compression.type=BLOCK; >>> > SET hive.exec.compress.output=true; >>> > >>> > A 132 MB (gzipped) input file going through a cleanup and getting >>> > populated >>> > in a sequencefile table is growing to 432 MB. What could be going >>> > wrong? >>> > >>> > Saurabh. >>> > >>> > On Wed, Feb 3, 2010 at 2:26 PM, Saurabh Nanda <[email protected]> >>> > wrote: >>> >> >>> >> Thanks, Zheng. Will do some more tests and get back. >>> >> >>> >> Saurabh. >>> >> >>> >> On Mon, Feb 1, 2010 at 1:22 PM, Zheng Shao <[email protected]> wrote: >>> >>> >>> >>> I would first check whether it is really the block compression or >>> >>> record compression. >>> >>> Also maybe the block size is too small but I am not sure that is >>> >>> tunable in SequenceFile or not. >>> >>> >>> >>> Zheng >>> >>> >>> >>> On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda >>> >>> <[email protected]> >>> >>> wrote: >>> >>> > Hi, >>> >>> > >>> >>> > The size of my Gzipped weblog files is about 35MB. However, upon >>> >>> > enabling >>> >>> > block compression, and inserting the logs into another Hive table >>> >>> > (sequencefile), the file size bloats up to about 233MB. I've done >>> >>> > similar >>> >>> > processing on a local Hadoop/Hive cluster, and while the >>> >>> > compressions >>> >>> > is not >>> >>> > as good as gzipping, it still is not this bad. What could be going >>> >>> > wrong? >>> >>> > >>> >>> > I looked at the header of the resulting file and here's what it >>> >>> > says: >>> >>> > >>> >>> > >>> >>> > >>> >>> > SEQ^F"org.apache.hadoop.io.BytesWritable^Yorg.apache.hadoop.io.Text^A^@'org.apache.hadoop.io.compress.GzipCodec >>> >>> > >>> >>> > Does Amazon Elastic MapReduce behave differently or am I doing >>> >>> > something >>> >>> > wrong? >>> >>> > >>> >>> > Saurabh. >>> >>> > -- >>> >>> > http://nandz.blogspot.com >>> >>> > http://foodieforlife.blogspot.com >>> >>> > >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Yours, >>> >>> Zheng >>> >> >>> >> >>> >> >>> >> -- >>> >> http://nandz.blogspot.com >>> >> http://foodieforlife.blogspot.com >>> > >>> > >>> > >>> > -- >>> > http://nandz.blogspot.com >>> > http://foodieforlife.blogspot.com >>> > >>> >>> >>> >>> -- >>> Yours, >>> Zheng >> >> >> >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
