Re: Multiple Jobs

Arun C Murthy Mon, 13 Sep 2010 15:40:14 -0700

Moving mapreduce-user@, bcc common-u...@. Please use the appropriate
project lists for discussions.


----

The default scheduler tries to get all tasks of a single job donebefore moving onto the next job of the same 'priority', howeverwhether multiple jobs are run in 'parallel' depends on how much of thecluster capacity is taken up by the highest 'priority' job at the headof the queue.

So, the behaviour you are seeing is probably the result of a singlejob taking up all of the cluster's capacity in terms of map/reduceslots.

The capacity-scheduler, when configured appropriately, will enforcecapacity constraints so that, for e.g., jobs of single queue cannottake up more than the queue's capacity, but you will have similarissues among jobs in the same queue. The CS also has user-limits toensure a single user doesn't take up all of the queue's capacity etc.

In your case, you might be able to get away with each of the jobsgoing to different queues.

Also, please be aware that the CS in 0.20.2 is quite dated, you mightwant to use the Yahoo!

GitHub (http://github.com/yahoo/hadoop-common) for the latest version
of the CS.

I'm currently working to get the Yahoo codebase released as an Apache
Release (maybe hadoop-0.20-security), once we get that done you should
be able to use the latest CapacityScheduler via an Apache Release.

The fair-scheduler tries to fair share a cluster among applications/user/pools. Please refer to it's documentation for more information.


Arun

On Sep 13, 2010, at 2:54 PM, Eric Sammer wrote:

The default scheduler in Hadoop is a FIFO scheduler. You can configure
either the Fair Scheduler or Capacity scheduler to allow jobs to runin
parallel and "share" the cluster resources. See
http://hadoop.apache.org/common/docs/r0.20.2/fair_scheduler.html and
http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.htmlrespectively.
On Mon, Sep 13, 2010 at 5:36 PM, Rahul Malviya <[email protected]>wrote:
Hi,

I am running Pig jobs on Hadoop cluster.
I just wanted to know whether I can run multiple jobs on hadoopcluster
simultaneously.
Currently when i start two jobs on hadoop they run in a serialfashion.
Is there a way to run N jobs simultaneously on hadoop ?

Thanks,
Rahul
--
Eric Sammer
twitter: esammer
data: www.cloudera.com

Re: Multiple Jobs

Reply via email to