[ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741221#action_12741221
 ] 

Jothi Padmanabhan commented on MAPREDUCE-157:
---------------------------------------------

Based on an offline discussion with Owen, Sharad and Devaraj, it does not 
appear that we have really strong use cases to support multiple formats for the 
JobHistory file. As a result, we will strongly tie the format to JSON and will 
focus on reducing the number of object created, writing information directly to 
the underlying stream where ever possible.  While we will retain the event 
framework, we will simplify the interface as compared to the previous design. 

One change is to write the event type preceding the actual event object so that 
the event readers can read the event type and then decide to create the correct 
event class based on the object. We however, will still have only one record 
per line. A line in the history file will now look like this:

{noformat}
{"EVENT_TYPE":"JOB_SUBMITTED"} 
{"EVENT_KIND":"JOB","JOB_ID":"job_test_0000","JOB_NAME":"TEST-JOB-SUBMITTED","USER_NAME":"Jothi","SUBMIT_TIME":1249887005100,"JOB_CONF_PATH":"/tmp"}
{noformat}

Events will now implement writeFields(JsonGenerator) and readFields(JsonParser) 
methods.

The JobHistory module would create one event writer per jobId; event writers 
would translate this into one history file. The event writer will also 
internally create a JsonGenerator based on this file and would use this for 
writing the actual event (by calling event.writeFields).

Similarly, the job history reading module would create one event reader per 
jobid/file. This would internally create one JsonParser that would be passed to 
the individual events' readFields method.

{code}

interface HistoryEvent {
  void writeFields (JsonGenerator gen) throws IOException;
  void readFields(JsonParser parser) throws IOException;
}

class JobHistory {
...
   // Generate a history file based on jobId, then create a new EventWriter
    JsonEventWriter eventWriter = new JsonEventWriter(conf, historyFile);
    eventWriter.write(jobSubmittedevent);
   eventWriter.write(jobFinishedEvent);
  ....
  eventWriter.close();

}

class SomeHistoryEventUser {
    JsonEventReader eventReader = new JsonEventReader(conf, historyFile);
    while ((ev = eventReader.getNextEvent()) != null) {
      //process ev
    }
   eventReader.close();
}

{code}








> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Jothi Padmanabhan
>
> Currently, parsing the job history logs with external tools is very difficult 
> because of the format. The most critical problem is that newlines aren't 
> escaped in the strings. That makes using tools like grep, sed, and awk very 
> tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to