A Map/Reduce task is a single jvm, running a single map or reduce thread. The number of map and reduce task execution slots in a cluster is fixed at cluster start time (actually at the task tracker start time). This restriction may be lifted at some point.
It is possible to tell a task to use a multi threaded map runner, in which case, there will be multiple threads executing the map task concurrently. This is ideal when each map method call is not bound by the local machine resources (disk, or cpu), a common case is to use this model when a call to a remote server, such as an HTTP request. Care must be taken not to overwhelm the remote servers. In short, if your map tasks are not saturating your machine, you can use the multi-threaded mapper in an attempt to reduce the overall run time of your job I have often set my task trackers up to run 1 map task per machine, and then control the concurrency by changing the number of threads, via multi-threaded mapper. On Fri, Sep 25, 2009 at 1:47 PM, Joe <[email protected]> wrote: > Question 1: > Can I put a MapReduce task driver in a thread such that it can be launched > multiple times to execute concurrently with the same program but different > data input/output? > > Question 2: > What exactly is the advantage of using MultithreadedMapRunner since the > system will automatically parallelize the task via multiple threads anyway? > Any detailed example of using MultithreadedMapRunner for reference? > > Thanks, > Joe > > > > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
