Re: performance

Ted Dunning Tue, 11 Mar 2008 14:19:30 -0700

Yes.  Each task is launching a JVM.

Map reduce is not generally useful for real-time applications.  It is VERY
useful for large scale data reductions done in advance of real-time
operations.


The basic issue is that the major performance contribution of map-reduce
architectures is large scale sequential access of data stores.  That is
pretty much in contradiction with real-time response.


On 3/11/08 1:43 PM, "Jason Rennie" <[EMAIL PROTECTED]> wrote:

> Have been working my way through the Map-Reduce tutorial.  Just got the
> WordCount example working.  One thing that concerns me is the time it took
> to run.  11 seconds is the fastest it's been able to complete after multiple
> tries.  I'm investigating Hadoop to distribute a real-time recommendations
> system.  I need sub-1-second response times.  Startup time is not so
> important.  I'm wondering what's taking so long.  Appears that slowness is
> actually in the mapping and reducing (see job output below).  I'm using
> java-sun-jdk-1.6.0_04.  Is each task launching its own jvm?  Could that be
> the reason for the slowness?
> 
> Jason
> 
> 08/03/11 20:34:55 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> 08/03/11 20:34:55 INFO mapred.JobClient: Running job: job_200803111826_0005
> 08/03/11 20:34:56 INFO mapred.JobClient:  map 0% reduce 0%
> 08/03/11 20:35:02 INFO mapred.JobClient:  map 66% reduce 0%
> 08/03/11 20:35:04 INFO mapred.JobClient:  map 100% reduce 0%
> 08/03/11 20:35:11 INFO mapred.JobClient:  map 100% reduce 100%
> 08/03/11 20:35:12 INFO mapred.JobClient: Job complete: job_200803111826_0005
> 08/03/11 20:35:12 INFO mapred.JobClient: Counters: 12
> 08/03/11 20:35:12 INFO mapred.JobClient:   Job Counters
> 08/03/11 20:35:12 INFO mapred.JobClient:     Launched map tasks=3
> 08/03/11 20:35:12 INFO mapred.JobClient:     Launched reduce tasks=1
> 08/03/11 20:35:12 INFO mapred.JobClient:     Data-local map tasks=3
> 08/03/11 20:35:12 INFO mapred.JobClient:   Map-Reduce Framework
> 08/03/11 20:35:12 INFO mapred.JobClient:     Map input records=2
> 08/03/11 20:35:12 INFO mapred.JobClient:     Map output records=8
> 08/03/11 20:35:12 INFO mapred.JobClient:     Map input bytes=50
> 08/03/11 20:35:12 INFO mapred.JobClient:     Map output bytes=82
> 08/03/11 20:35:12 INFO mapred.JobClient:     Combine input records=8
> 08/03/11 20:35:12 INFO mapred.JobClient:     Combine output records=6
> 08/03/11 20:35:12 INFO mapred.JobClient:     Reduce input groups=5
> 08/03/11 20:35:12 INFO mapred.JobClient:     Reduce input records=6
> 08/03/11 20:35:12 INFO mapred.JobClient:     Reduce output records=5
>

Re: performance

Reply via email to