Moving mapreduce-user@, bcc common-u...@. Please use the appropriate
project lists for discussions.
On Sep 13, 2010, at 1:47 AM, aniket ray wrote:
I see that the reduces of Queue 1 don't start till maps of Job 2 are
over
(even though maps of Job 1 are complete). I am not able to understand
this behavior and feel that this may be a configuration issue that I
am
missing. Since they are independent tasks and capacity is free,
shouldn't
reduce tasks of Queue kick in?
This shouldn't happen in the CS. Maybe you are hitting 'slowstart' for
the reduces of Job1?
From src/mapred/mapred-default.xml:
<property>
<name>mapred.reduce.slowstart.completed.maps</name>
<value>0.05</value>
<description>Fraction of the number of maps in the job which should
be
complete before reduces are scheduled for the job.
</description>
</property>
I've been trying to run some map/reduce jobs in parallel using the
capacity
scheduler on hadoop 0.20.2.
The CS in 0.20.2 is quite dated, you might want to use the Yahoo!
GitHub (http://github.com/yahoo/hadoop-common) for the latest version
of the CS.
I'm currently working to get the Yahoo codebase released as an Apache
Release (maybe hadoop-0.20-security), once we get that done you should
be able to use the latest CapacityScheduler via an Apache Release.
Arun