Moving mapreduce-user@, bcc common-u...@. Please use the appropriate
project lists for discussions.

----

The default scheduler tries to get all tasks of a single job done before moving onto the next job of the same 'priority', however whether multiple jobs are run in 'parallel' depends on how much of the cluster capacity is taken up by the highest 'priority' job at the head of the queue.

So, the behaviour you are seeing is probably the result of a single job taking up all of the cluster's capacity in terms of map/reduce slots.

The capacity-scheduler, when configured appropriately, will enforce capacity constraints so that, for e.g., jobs of single queue cannot take up more than the queue's capacity, but you will have similar issues among jobs in the same queue. The CS also has user-limits to ensure a single user doesn't take up all of the queue's capacity etc.

In your case, you might be able to get away with each of the jobs going to different queues.

Also, please be aware that the CS in 0.20.2 is quite dated, you might want to use the Yahoo!
GitHub (http://github.com/yahoo/hadoop-common) for the latest version
of the CS.

I'm currently working to get the Yahoo codebase released as an Apache
Release (maybe hadoop-0.20-security), once we get that done you should
be able to use the latest CapacityScheduler via an Apache Release.

The fair-scheduler tries to fair share a cluster among applications/ user/pools. Please refer to it's documentation for more information.

Arun

On Sep 13, 2010, at 2:54 PM, Eric Sammer wrote:

The default scheduler in Hadoop is a FIFO scheduler. You can configure
either the Fair Scheduler or Capacity scheduler to allow jobs to run in
parallel and "share" the cluster resources. See
http://hadoop.apache.org/common/docs/r0.20.2/fair_scheduler.html and
http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.htmlrespectively .

On Mon, Sep 13, 2010 at 5:36 PM, Rahul Malviya <[email protected]> wrote:

Hi,

I am running Pig jobs on Hadoop cluster.

I just wanted to know whether I can run multiple jobs on hadoop cluster
simultaneously.

Currently when i start two jobs on hadoop they run in a serial fashion.

Is there a way to run N jobs simultaneously on hadoop ?

Thanks,
Rahul




--
Eric Sammer
twitter: esammer
data: www.cloudera.com

Reply via email to