Have been working my way through the Map-Reduce tutorial. Just got the WordCount example working. One thing that concerns me is the time it took to run. 11 seconds is the fastest it's been able to complete after multiple tries. I'm investigating Hadoop to distribute a real-time recommendations system. I need sub-1-second response times. Startup time is not so important. I'm wondering what's taking so long. Appears that slowness is actually in the mapping and reducing (see job output below). I'm using java-sun-jdk-1.6.0_04. Is each task launching its own jvm? Could that be the reason for the slowness?
Jason 08/03/11 20:34:55 INFO mapred.FileInputFormat: Total input paths to process : 2 08/03/11 20:34:55 INFO mapred.JobClient: Running job: job_200803111826_0005 08/03/11 20:34:56 INFO mapred.JobClient: map 0% reduce 0% 08/03/11 20:35:02 INFO mapred.JobClient: map 66% reduce 0% 08/03/11 20:35:04 INFO mapred.JobClient: map 100% reduce 0% 08/03/11 20:35:11 INFO mapred.JobClient: map 100% reduce 100% 08/03/11 20:35:12 INFO mapred.JobClient: Job complete: job_200803111826_0005 08/03/11 20:35:12 INFO mapred.JobClient: Counters: 12 08/03/11 20:35:12 INFO mapred.JobClient: Job Counters 08/03/11 20:35:12 INFO mapred.JobClient: Launched map tasks=3 08/03/11 20:35:12 INFO mapred.JobClient: Launched reduce tasks=1 08/03/11 20:35:12 INFO mapred.JobClient: Data-local map tasks=3 08/03/11 20:35:12 INFO mapred.JobClient: Map-Reduce Framework 08/03/11 20:35:12 INFO mapred.JobClient: Map input records=2 08/03/11 20:35:12 INFO mapred.JobClient: Map output records=8 08/03/11 20:35:12 INFO mapred.JobClient: Map input bytes=50 08/03/11 20:35:12 INFO mapred.JobClient: Map output bytes=82 08/03/11 20:35:12 INFO mapred.JobClient: Combine input records=8 08/03/11 20:35:12 INFO mapred.JobClient: Combine output records=6 08/03/11 20:35:12 INFO mapred.JobClient: Reduce input groups=5 08/03/11 20:35:12 INFO mapred.JobClient: Reduce input records=6 08/03/11 20:35:12 INFO mapred.JobClient: Reduce output records=5 -- Jason Rennie Head of Machine Learning Technologies, StyleFeeder http://www.stylefeeder.com/ Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/
