[
https://issues.apache.org/jira/browse/HADOOP-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated HADOOP-3708:
----------------------------------
Attachment: history-scripts.patch
I've attached a patch with the scripts I built at Facebook. There is also a
readme with an explanation of what they do. Basically there are two scripts:
One that just parses the history logs and jobconfs and puts the exact same data
into MySQL, creating tables of jobs, jobconf XML key-value pairs, tasks and
task attempts, and then a second script that performs some joins on these
tables to create a set of job summary reports with ~1 KB of data per job that
can be used to run queries quickly for the purposes of visualization. You can
build a visualization based on this database using your favorite tool.
Unfortunately I can't open source the one used at Facebook because it depends
on a lot of internal web libraries.
> Provide accounting functionality for Hadoop resource manager
> ------------------------------------------------------------
>
> Key: HADOOP-3708
> URL: https://issues.apache.org/jira/browse/HADOOP-3708
> Project: Hadoop Core
> Issue Type: New Feature
> Components: contrib/capacity-sched
> Reporter: Hemanth Yamijala
> Assignee: Hemanth Yamijala
> Fix For: 0.20.0
>
> Attachments: history-scripts.patch
>
>
> HADOOP-3421 describes requirements for a new resource manager in Hadoop to
> schedule Map/Reduce jobs. In production systems, it would be useful to
> produce accounting information related to the scheduling - such as job start
> and run times, resources used, etc. This information can be consumed by other
> systems to build accounting for shared resources. This JIRA is for tracking
> the requirements, approach and implementation for producing accounting
> information.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.