A Map/Reduce task is a single jvm, running a single map or reduce thread.

The number of map and reduce task execution slots in a cluster is fixed at
cluster start time (actually at the task tracker start time). This
restriction may be lifted at some point.

It is possible to tell a task to use a multi threaded map runner, in which
case, there will be multiple threads executing the map task concurrently.

This is ideal when each map method call is not bound by the local machine
resources (disk, or cpu), a common case is to use this model when a call to
a remote server, such as an HTTP request. Care must be taken not to
overwhelm the remote servers.

In short, if your map tasks are not saturating your machine, you can use the
multi-threaded mapper in an attempt to reduce the overall run time of your
job

I have often  set my task trackers up to run 1 map task per machine, and
then control the concurrency by changing the number of threads, via
multi-threaded mapper.


On Fri, Sep 25, 2009 at 1:47 PM, Joe <[email protected]> wrote:

> Question 1:
> Can I put a MapReduce task driver in a thread such that it can be launched
> multiple times to execute concurrently with the same program but different
> data input/output?
>
> Question 2:
> What exactly is the advantage of using MultithreadedMapRunner since the
> system will automatically parallelize the task via multiple threads anyway?
> Any detailed example of using MultithreadedMapRunner for reference?
>
> Thanks,
> Joe
>
>
>
>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Reply via email to