Hi, The size of my Gzipped weblog files is about 35MB. However, upon enabling block compression, and inserting the logs into another Hive table (sequencefile), the file size bloats up to about 233MB. I've done similar processing on a local Hadoop/Hive cluster, and while the compressions is not as good as gzipping, it still is not this bad. What could be going wrong?
I looked at the header of the resulting file and here's what it says: SEQ^F"org.apache.hadoop.io.BytesWritable^Yorg.apache.hadoop.io.Text^A^@'org.apache.hadoop.io.compress.GzipCodec Does Amazon Elastic MapReduce behave differently or am I doing something wrong? Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
