I would first check whether it is really the block compression or record compression. Also maybe the block size is too small but I am not sure that is tunable in SequenceFile or not.
Zheng On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda <[email protected]> wrote: > Hi, > > The size of my Gzipped weblog files is about 35MB. However, upon enabling > block compression, and inserting the logs into another Hive table > (sequencefile), the file size bloats up to about 233MB. I've done similar > processing on a local Hadoop/Hive cluster, and while the compressions is not > as good as gzipping, it still is not this bad. What could be going wrong? > > I looked at the header of the resulting file and here's what it says: > > SEQ^F"org.apache.hadoop.io.BytesWritable^Yorg.apache.hadoop.io.Text^A^@'org.apache.hadoop.io.compress.GzipCodec > > Does Amazon Elastic MapReduce behave differently or am I doing something > wrong? > > Saurabh. > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
