Yes. Each task is launching a JVM. Map reduce is not generally useful for real-time applications. It is VERY useful for large scale data reductions done in advance of real-time operations.
The basic issue is that the major performance contribution of map-reduce architectures is large scale sequential access of data stores. That is pretty much in contradiction with real-time response. On 3/11/08 1:43 PM, "Jason Rennie" <[EMAIL PROTECTED]> wrote: > Have been working my way through the Map-Reduce tutorial. Just got the > WordCount example working. One thing that concerns me is the time it took > to run. 11 seconds is the fastest it's been able to complete after multiple > tries. I'm investigating Hadoop to distribute a real-time recommendations > system. I need sub-1-second response times. Startup time is not so > important. I'm wondering what's taking so long. Appears that slowness is > actually in the mapping and reducing (see job output below). I'm using > java-sun-jdk-1.6.0_04. Is each task launching its own jvm? Could that be > the reason for the slowness? > > Jason > > 08/03/11 20:34:55 INFO mapred.FileInputFormat: Total input paths to process > : 2 > 08/03/11 20:34:55 INFO mapred.JobClient: Running job: job_200803111826_0005 > 08/03/11 20:34:56 INFO mapred.JobClient: map 0% reduce 0% > 08/03/11 20:35:02 INFO mapred.JobClient: map 66% reduce 0% > 08/03/11 20:35:04 INFO mapred.JobClient: map 100% reduce 0% > 08/03/11 20:35:11 INFO mapred.JobClient: map 100% reduce 100% > 08/03/11 20:35:12 INFO mapred.JobClient: Job complete: job_200803111826_0005 > 08/03/11 20:35:12 INFO mapred.JobClient: Counters: 12 > 08/03/11 20:35:12 INFO mapred.JobClient: Job Counters > 08/03/11 20:35:12 INFO mapred.JobClient: Launched map tasks=3 > 08/03/11 20:35:12 INFO mapred.JobClient: Launched reduce tasks=1 > 08/03/11 20:35:12 INFO mapred.JobClient: Data-local map tasks=3 > 08/03/11 20:35:12 INFO mapred.JobClient: Map-Reduce Framework > 08/03/11 20:35:12 INFO mapred.JobClient: Map input records=2 > 08/03/11 20:35:12 INFO mapred.JobClient: Map output records=8 > 08/03/11 20:35:12 INFO mapred.JobClient: Map input bytes=50 > 08/03/11 20:35:12 INFO mapred.JobClient: Map output bytes=82 > 08/03/11 20:35:12 INFO mapred.JobClient: Combine input records=8 > 08/03/11 20:35:12 INFO mapred.JobClient: Combine output records=6 > 08/03/11 20:35:12 INFO mapred.JobClient: Reduce input groups=5 > 08/03/11 20:35:12 INFO mapred.JobClient: Reduce input records=6 > 08/03/11 20:35:12 INFO mapred.JobClient: Reduce output records=5 >
