Have been working my way through the Map-Reduce tutorial.  Just got the
WordCount example working.  One thing that concerns me is the time it took
to run.  11 seconds is the fastest it's been able to complete after multiple
tries.  I'm investigating Hadoop to distribute a real-time recommendations
system.  I need sub-1-second response times.  Startup time is not so
important.  I'm wondering what's taking so long.  Appears that slowness is
actually in the mapping and reducing (see job output below).  I'm using
java-sun-jdk-1.6.0_04.  Is each task launching its own jvm?  Could that be
the reason for the slowness?

Jason

08/03/11 20:34:55 INFO mapred.FileInputFormat: Total input paths to process
: 2
08/03/11 20:34:55 INFO mapred.JobClient: Running job: job_200803111826_0005
08/03/11 20:34:56 INFO mapred.JobClient:  map 0% reduce 0%
08/03/11 20:35:02 INFO mapred.JobClient:  map 66% reduce 0%
08/03/11 20:35:04 INFO mapred.JobClient:  map 100% reduce 0%
08/03/11 20:35:11 INFO mapred.JobClient:  map 100% reduce 100%
08/03/11 20:35:12 INFO mapred.JobClient: Job complete: job_200803111826_0005
08/03/11 20:35:12 INFO mapred.JobClient: Counters: 12
08/03/11 20:35:12 INFO mapred.JobClient:   Job Counters
08/03/11 20:35:12 INFO mapred.JobClient:     Launched map tasks=3
08/03/11 20:35:12 INFO mapred.JobClient:     Launched reduce tasks=1
08/03/11 20:35:12 INFO mapred.JobClient:     Data-local map tasks=3
08/03/11 20:35:12 INFO mapred.JobClient:   Map-Reduce Framework
08/03/11 20:35:12 INFO mapred.JobClient:     Map input records=2
08/03/11 20:35:12 INFO mapred.JobClient:     Map output records=8
08/03/11 20:35:12 INFO mapred.JobClient:     Map input bytes=50
08/03/11 20:35:12 INFO mapred.JobClient:     Map output bytes=82
08/03/11 20:35:12 INFO mapred.JobClient:     Combine input records=8
08/03/11 20:35:12 INFO mapred.JobClient:     Combine output records=6
08/03/11 20:35:12 INFO mapred.JobClient:     Reduce input groups=5
08/03/11 20:35:12 INFO mapred.JobClient:     Reduce input records=6
08/03/11 20:35:12 INFO mapred.JobClient:     Reduce output records=5


-- 
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/
Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/

Reply via email to