[
https://issues.apache.org/jira/browse/HADOOP-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621865#action_12621865
]
Vivek Ratan commented on HADOOP-3930:
-------------------------------------
I think we need to first decide whether queues are explicit in this API or not.
The problem with making queues explicit in the API is that every scheduler will
have to support one, or at least a default one. But that's not so bad, IMO.
getSchedulingInfo() should really return key-value pairs for queues, not for
jobs. In the HADOOP-3445 scheduler, for example, we need to display scheduling
information associated with a queue - its capacity (both 'guaranteed' and
'allocated'), how many unique users have submitted jobs, how many tasks are
running, how many are waiting. etc. This information is per queue, and doesn't
make sense per job. I'd much rather have getSchedulingInfo() take in a queue
name as a parameter, if we make queues explicit. In fact, I don't see what kind
of scheduling information you'd associate with a job. Matei, do you have
examples of what getSchedulingInfo would return for jobs?
Similarly, getJobComparator() makes more sense when applied to a queue. In
3445, jobs are ordered per queue, and there is no global ordering. Furthermore,
doesn't it make more sense to get a sorted collection of jobs, per queue, back
from the scheduler, rather than a Comparator? Or are you imagining the UI and
CLI to maintain a list of jobs all the time and then apply the comparator
periodically?
> Decide how to integrate scheduler info into CLI and job tracker web page
> ------------------------------------------------------------------------
>
> Key: HADOOP-3930
> URL: https://issues.apache.org/jira/browse/HADOOP-3930
> Project: Hadoop Core
> Issue Type: Improvement
> Reporter: Matei Zaharia
> Priority: Minor
>
> We need a way for job schedulers such as HADOOP-3445 and HADOOP-3476 to
> provide info to display on the JobTracker web interface and in the CLI. The
> main things needed seem to be:
> * A way for schedulers to provide info to show in a column on the web UI and
> in the CLI - something as simple as a single string, or a map<string, int>
> for multiple parameters.
> * Some sorting order for jobs - maybe a method to sort a list of jobs.
> Let's figure out what the best way to do this is and implement it in the
> existing schedulers.
> My first-order proposal at an API: Augment the TaskScheduler with
> * public Map<String, String> getSchedulingInfo(JobInProgress job) -- returns
> key-value pairs which are displayed in columns on the web UI or the CLI.
> * public Comparator<JobInProgress> getJobComparator() -- returns a comparator
> that can be used to determine the order in which jobs will be run, for
> sorting the jobs in the CLI.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.