> #2 Compressed logs in textfile tables: 60sec (filesize of 736 MB over 8 > compressed files) > #3 Compressed logs in sequencefile tables: 101sec (filesize of 4,773 MB > over 126 compressed files) >
Why is there such a *big* difference in compression ratios between the gzip utility and Hive? Uncompressed file size: approx 3500 MB Gzip utility: approx 250 MB org.apache.hadoop.io.compress.GzipCodec (BLOCK): approx 1600 MB org.apache.hadoop.io.compress.DefaultCodec (BLOCK): approx 1700 MB Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
