Moving mapreduce-user@, bcc common-u...@. Please use the appropriate
project lists for discussions.
----
The default scheduler tries to get all tasks of a single job done
before moving onto the next job of the same 'priority', however
whether multiple jobs are run in 'parallel' depends on how much of the
cluster capacity is taken up by the highest 'priority' job at the head
of the queue.
So, the behaviour you are seeing is probably the result of a single
job taking up all of the cluster's capacity in terms of map/reduce
slots.
The capacity-scheduler, when configured appropriately, will enforce
capacity constraints so that, for e.g., jobs of single queue cannot
take up more than the queue's capacity, but you will have similar
issues among jobs in the same queue. The CS also has user-limits to
ensure a single user doesn't take up all of the queue's capacity etc.
In your case, you might be able to get away with each of the jobs
going to different queues.
Also, please be aware that the CS in 0.20.2 is quite dated, you might
want to use the Yahoo!
GitHub (http://github.com/yahoo/hadoop-common) for the latest version
of the CS.
I'm currently working to get the Yahoo codebase released as an Apache
Release (maybe hadoop-0.20-security), once we get that done you should
be able to use the latest CapacityScheduler via an Apache Release.
The fair-scheduler tries to fair share a cluster among applications/
user/pools. Please refer to it's documentation for more information.
Arun
On Sep 13, 2010, at 2:54 PM, Eric Sammer wrote:
The default scheduler in Hadoop is a FIFO scheduler. You can configure
either the Fair Scheduler or Capacity scheduler to allow jobs to run
in
parallel and "share" the cluster resources. See
http://hadoop.apache.org/common/docs/r0.20.2/fair_scheduler.html and
http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.htmlrespectively
.
On Mon, Sep 13, 2010 at 5:36 PM, Rahul Malviya <[email protected]>
wrote:
Hi,
I am running Pig jobs on Hadoop cluster.
I just wanted to know whether I can run multiple jobs on hadoop
cluster
simultaneously.
Currently when i start two jobs on hadoop they run in a serial
fashion.
Is there a way to run N jobs simultaneously on hadoop ?
Thanks,
Rahul
--
Eric Sammer
twitter: esammer
data: www.cloudera.com