Thanks, Zheng. Will do some more tests and get back. Saurabh.
On Mon, Feb 1, 2010 at 1:22 PM, Zheng Shao <[email protected]> wrote: > I would first check whether it is really the block compression or > record compression. > Also maybe the block size is too small but I am not sure that is > tunable in SequenceFile or not. > > Zheng > > On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda <[email protected]> > wrote: > > Hi, > > > > The size of my Gzipped weblog files is about 35MB. However, upon enabling > > block compression, and inserting the logs into another Hive table > > (sequencefile), the file size bloats up to about 233MB. I've done similar > > processing on a local Hadoop/Hive cluster, and while the compressions is > not > > as good as gzipping, it still is not this bad. What could be going wrong? > > > > I looked at the header of the resulting file and here's what it says: > > > > > SEQ^F"org.apache.hadoop.io.BytesWritable^Yorg.apache.hadoop.io.Text^A^@'org.apache.hadoop.io.compress.GzipCodec > > > > Does Amazon Elastic MapReduce behave differently or am I doing something > > wrong? > > > > Saurabh. > > -- > > http://nandz.blogspot.com > > http://foodieforlife.blogspot.com > > > > > > -- > Yours, > Zheng > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
