[ 
https://issues.apache.org/jira/browse/HADOOP-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641274#action_12641274
 ] 

Vivek Ratan commented on HADOOP-4428:
-------------------------------------

bq. We don't look at jobs in any other queue, if the job currently initialized 
is not yet running.

I wanted to consider this in some detail. When initTasks() returns, the job is 
still not in a running state, because its setup task needs to run first. For 
discussion sake, assume we have queues Q1, Q2 ... Qn and that we're considering 
queues in that order, starting with Q1. So after we call initTasks() on a job 
in Q1 (say, J1), we have the following options, in order to find a task to run: 
1. We can look at the next job in Q1 This is not a good option since we'll face 
the same situation - we'll call initTasks() for the next job, and then look at 
the next job, and so on. 
1. We can look at jobs in the next queue. This is a viable option. It does seem 
a bit unfair, because you're penalizing Q1 for the duration of the time it 
takes for J1's setup task to run, but you could equally well argue that this 
unfairness is temporary and is equally applicable to all queues. 
1. We can return nothing to the the TT. As a result, all TTs that send 
heartbeats to the JT during the time that J1's setup task is running, will get 
nothing to run. Most setup tasks should take a couple of heartbeats to run, so 
this won't be a frequent problem, but if the setup task contains user code that 
does a bunch of stuff, the problem is exacerbated. 

Upon further reflection, I'd argue for the second approach where we move on to 
the next queue. Returning nothing to the TTs causes unnecessary 
under-utilization. 

The right way to do things, IMO, is get the setup/cleanup tasks out of 
initTasks(), which I'll argue elsewhere, but this problem (of initTasks() not 
necessarily changing the job's state to RUNNING) can rise up again if we decide 
to call initTasks() in a separate thread, the way it's done in the default 
scheduler. 

> Job Priorities are not handled properly 
> ----------------------------------------
>
>                 Key: HADOOP-4428
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4428
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: Cluster:  106 TTs  MapCapacity=212, ReduceCapacity=212
> Single Queue=default, User Limit=25, Priorities = Yes.
> Using hadoop branch 0.19 revision=705159 
>            Reporter: Karam Singh
>            Assignee: Vinod K V
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4428-20081017.1.txt, HADOOP-4428-20081020.txt, 
> HADOOP-4428.patch
>
>
> Job Priorities are not handled properly 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to