[
https://issues.apache.org/jira/browse/HADOOP-6592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837413#action_12837413
]
Arun C Murthy commented on HADOOP-6592:
---------------------------------------
I speak from experience, at various times I baby-sit several clusters with
several thousand machines and tens of thousands of jobs as my day-job... doing
'social scheduling' as you suggest very quickly breaks down - not just at the
very large scale I'm used to. Once you go past tens of nodes and tens of users
you cannot go find them at their desk.
As Matei suggests, please take a look at either fair-scheduler or the
capacity-scheduler. They both are designed to work in a multi-tenant
environment, and are reasonably adept at it... and they will need to improve.
> Scheduler: Pause button desirable
> ---------------------------------
>
> Key: HADOOP-6592
> URL: https://issues.apache.org/jira/browse/HADOOP-6592
> Project: Hadoop Common
> Issue Type: Wish
> Reporter: Adam Kramer
> Priority: Minor
>
> It would be lovely if, from the jobtracker page, I could click a button
> that's not "kill" or "fail" but ..."pause."
> The pause button would stop a certain task from starting any more mappers or
> reducers. They would all wait in the "pending" stage until the job is
> "un-paused." Currently-running tasks would continue to run, and then
> complete, thus freeing the resources for other jobs.
> This would help a lot for systems (esp. Hive) in which one or two jobs are
> hogging a lot of mappers or reducers. The ones they have would finish, and
> then other jobs could "catch up," and then they could be unpaused for a
> while. This would also allow for user-level throttling of their jobs in
> instances where they need a lot of resources but have the time to spare.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.