> #1 Uncompressed logs in textfile tables: 106sec (filesize of 7,686 MB over
> 8 uncompressed files)
> #2 Compressed logs in textfile tables: 60sec (filesize of 736 MB over 8
> compressed files)
> #3 Compressed logs in sequencefile tables: 101sec (filesize of 4,773 MB
> over 126 compressed files)
>

Some more stats, if anyone's interested. I ran all the three tables
(described above) through my ETL query (as described in
http://nandz.blogspot.com/2009/07/using-hive-for-weblog-analysis.html)

#1: 699sec with 1,561,633 rows in the final table
#2: 563sec with 1,561,633 rows in the final table
#3: 697sec with 1,654,291 rows in the final table (!)

For #3 I've got a different row count. I tried importing the gzipped files &
putting them through ETL again and landed up with 1,743,377 rows the second
time! Will spend some more time to see where I'm going wrong.

However, with these stats it seems that approach #2 gives best results with
complex queries.

#1 = Uncompressed log files into uncompressed textfile tables
#2 = Inserting #1 with compression on into sequencefile tables
#3 = Compressed log files (gzip) into textfile tables

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Reply via email to