[ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744415#action_12744415
 ] 

Jothi Padmanabhan commented on MAPREDUCE-157:
---------------------------------------------

Regarding the interface for readers, we could support two kinds of users:

# Users who want fine grained control and would handle the individual events 
themselves. 
# Users who want a much more granular, summary kind of information. 

For users of type 1, who want finer grained information, they could use Event 
Readers to iterate through events and do the necessary processing

For users of type 2, we could provide more granular information through a 
JobHistoryParser class. This class would internally build the Job-Task-Attempt 
hierarchy/information by consuming all events using a event reader and make the 
summary information available for users to access. Users could do some thing 
like

{code}

parser.init(history file or stream)

JobInfo jobInfo = parser.getJobInfo();

// use the getters to get jobinfo (example: start time, finish time, counters, 
id, user name, conf, total maps, total reds, among others)

List<TaskInfo> taskInfoList = jobInfo.getAllTasks();

// Iterate through the list and do necessary processing. Getters for taskinfo 
would include taskid, task type, status, splits, counters, etc

List<TaskAttemptInfo> attemptsList = taskinfo.getAllAttempts();

// Attempt info would have getters for attempt id, errors, status, state, start 
time, finish time, tracker name, port etc.

{code}


Comments/Suggestions/Thoughts?

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Owen O'Malley
>            Assignee: Jothi Padmanabhan
>
> Currently, parsing the job history logs with external tools is very difficult 
> because of the format. The most critical problem is that newlines aren't 
> escaped in the strings. That makes using tools like grep, sed, and awk very 
> tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to