If you follow Approach #3, you should have 8 big compressed sequencefiles instead of 126 small files.
By the way, you probably didn't set the compression type to BLOCK compression, otherwise sequencefile compression won't perform like that. Try setting up this in your hive-site.xml or hadoop-site.xml: <property> <name>io.seqfile.compression.type</name> <value>BLOCK</value> </property> See http://blog.foofactory.fi/2006/12/my-fellow-nutch-developer-andrzej.html Zheng On Sun, Jul 26, 2009 at 10:05 PM, Saurabh Nanda<[email protected]> wrote: > >> Can you help put that information into appropriate place on the wiki >> (where you see fit)? >> Thanks for the help. > > Will do. > >> >> By the way, I guess we need to debug what went wrong with the >> "count(1)" queries. There is definitely something going wrong. > > My bad here. I think I forgot to import some files when running the queries > earlier. The counts are exactly the same. However the timings for "select > count(1)" queries are very different. > > #1 Uncompressed logs in textfile tables: 106sec (filesize of 7,686 MB over 8 > uncompressed files) > #2 Compressed logs in textfile tables: 60sec (filesize of 736 MB over 8 > compressed files) > #3 Compressed logs in sequencefile tables: 101sec (filesize of 4,773 MB over > 126 compressed files) > > >> >> For the timing, how much mapper slots do you have in your cluster? > > I have a 4-node cluster with mapred.reduce.tasks=17 Is that what you mean by > mapper slots? > >> >> Approach #3: >> a) import gzip files into textfile table >> b) set hive.exec.compress.output to true >> c) inserted into sequencefile table >> This will create bigger sequencefiles which will help reducing the >> overhead. This is better than Approach #2 because jobs from the >> sequencefile tables will have more mappers. > > This is exactly what I did in #3 above. But, from those benchmarks #2 seems > to give the best results, both, in terms of file size and speed. Is that not > what you were expecting? > > Saurabh. > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
