[ 
https://issues.apache.org/jira/browse/HADOOP-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647593#action_12647593
 ] 

Vivek Ratan commented on HADOOP-4658:
-------------------------------------

There are a couple of things going on here. 

The scheduler is notified by the JT when a job is considered 'done' (see 
HADOOP-4053). At that point, we recompute the number of users who have 
submitted jobs in the queue. If this notification is coming in late, you need 
to see why. But the scheduler's behavior is right in terms of computing the 
user limit. The scheduler doesn't decide when a job is 'done'. The JT does. And 
that, IMO,is correct behavior. 

Secondly, if we walk through the entire queue and no job can accept a slot 
(likely because, as in this case, either jobs are over capacity or don't have 
tasks to run) and there are no waiting tasks, we still want the slot to be used 
by the queue, so we walk through the queue again, this tiem without considering 
user limits. Likely, the first job will start getting a bunch of slots. This is 
by design, as it's really hard to argue what is fair in this case. Taking one 
of your examples, suppose the queue capacity is 100 and we have four jobs from 
four different users. Each is using 25 slots. J3 starts finishing up, and at 
some point, is only running, say, 5 tasks. Also assume there are no waiting 
jobs. Now, what's the right behavior? WHo should get the slot? J1, right? Who 
gets the next slot? You can argue that you want to redistribute J3's 20 unused 
slots among J1, J2, and J4, but this recomputation gets really complicated. So, 
we took a simpler approach by saying that free slots are offered to jobs in 
order. So J1 will get a bunch of free slots, which will let it finish fast. 
Eventually J3 finishes, is taken out of the queue, and user limits are 
recomputed. We couldn't think of a simple and more fair approach here. Note 
that this situation is a bit rare. On a regular, well-utilized cluster, you'll 
have a bunch of waiting jobs and they will start running. J3's user's first 
waiting job will start running, which is the right thing. 

So, in summary, I'd say this is expected behavior. 

As for your comment on speculative tasks being run by J1 - that's really a 
different call. J1 runs a speculative task if it legitimately has a speculate 
task to run, not because there's a slot free. J1 can come back and say it 
doesn't have any task to run, in which case J2 is looked at next. If J1 is 
running 17 speculative tasks, they're as genuine, and higher priority, than 
J2's tasks, so I'd say that's still the right behavior. 



> User limit is not expanding back properly.
> ------------------------------------------
>
>                 Key: HADOOP-4658
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4658
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>         Environment: GC=100% nodes=104, map_capacity=208, 
> reduce_capacity=208, user-limit=25%;
>            Reporter: Karam Singh
>            Assignee: Amar Kamat
>
> User limit is not expanding back properly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to