[
https://issues.apache.org/jira/browse/MAPREDUCE-740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729400#action_12729400
]
Hong Tang commented on MAPREDUCE-740:
-------------------------------------
@vinod
I do have a specific usage case where we want to keep track of the amount of
resources being used by each job, each user, or each queue (for capacity
scheduler). Granted, all these information is readily available in job history
log. However, there are a few drawbacks by depending on job history logs: (1)
we are interested in keeping a history of finished and possibly do group-by for
user and queue. so scrapping individual history log is messy; (2) the added
dependency to keep up with possible future changes to the history log format.
For starter, I think the summary should include the following information:
- job queuing/waiting time
- job start time
- job finish time
- total maps/reduces
- user id
- job id (job-tracker ID + job sequence number)
- map/reduce slot hours (need to apply multiplier for high ram tasks
that take multiple slots per map/reduce task)
- queue name
- job status (success or failure)
- cluster map/reduce slot capacity
The only thing that job history log does not provide currently is the slot
hours for all maps and reduces belonging to the same job.
> Provide summary information per job once a job is finished.
> -----------------------------------------------------------
>
> Key: MAPREDUCE-740
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-740
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Hong Tang
> Priority: Minor
>
> It would be nice if JobTracker can output a one line summary information per
> job once a job is finished. Otherwise, users or system administrators would
> end up scraping individual job history logs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.