perhaps we should consider separating the copy phase of the reducer
from the execution phase, and exempt the copy phase from the reduce
task limit?

this is a confusing issue, but more importantly, the file copy phase
uses little resources, as compared with the reduce phase itself
(thinking of the memory and CPU that goes into sorting and the
reducer).



On 7/20/06, Yoram Arnon <[EMAIL PROTECTED]> wrote:
"mapred.tasktracker.tasks.maximum" does apply to per task type.

The reason reduce tasks launch from the get go is that they collect the
output from map tasks as soon as it's available. The observation is that the
shuffle of the data from map tasks to reduce tasks over the network is often
the number one bottleneck of the entire job, so starting that early and
keeping the network saturation all during job execution optimizes job
execution time.

In your case, ideally your 41 reducers will have almost all their input
ready and waiting when the map tasks complete, and will immediately start
sorting and reducing. More likely, the maps will complete faster than data
can be shipped to the reducers, so the reducers will still wait for it, but
for less time than if they were just launched. All during map execution data
was being shipped to them.

Yoram

> -----Original Message-----
> From: Kalbande, Manish [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July 20, 2006 11:32 AM
> To: [email protected]
> Subject: Task type priorities during scheduling ?
>
> Hi,
>
> I am running a cluster of 21 nodes.
> while running any task I observed that reduce tasks are getting
> scheduled much before all the map tasks are finished.
> As a result, reduce tasks are waiting for map tasks to finish
> and total
> time for map tasks is more because they are not getting scheduled
> quickly.
>
> It will be better if reduce tasks are scheduled only after
> there are no
> map tasks to be performed.
>
> For example, during generate job, we had total 544 map tasks and 41
> reduce tasks.
> All 41 reduce tasks got scheduled and only 42 map tasks could be
> schedules at a time.
>
> My current configuration
>
> mapred.map.tasks = 83
> mapred.reduce.tasks=41
> mapred.tasktracker.tasks.maximum=2
>
> Also, does "mapred.tasktracker.tasks.maximum" applies to per
> task type?
> or is it for all tasks? From my observation is appears to be per task
> type.
>
> thanks
> Manish
>
>


Reply via email to