[ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738817#action_12738817
 ] 

Jothi Padmanabhan commented on MAPREDUCE-157:
---------------------------------------------

Here is a proposal for the change to write JobHistory events  in JSON format.

# We will decouple the generation of events with the actual writing/reading of 
events. JobHistory module will generate Events and pass them on to Event 
Writers who would do the actual writing of events to the underlying stream. 
Similarly, on the reading front, Event Readers will read data from the 
underlying stream and generate events which would then be passed on to the 
callers (History Viewers, other external log aggregators)
# In addition, there would be a provision to stream events directly to external 
listeners as and when they are generated (See HistoryListener Interface in the 
code snippet below). 
# The Framework's event writer would write the events to a local file in JSON 
format. We will use http://jackson.codehaus.org/
# For modularity, we will have abstract classes for HistoryEvents, 
HistoryEventWriter and HistoryEventReaders. Events will have a kind and a type. 
Examples of Kind include Job, Task, TaskAttempt. Each kind could support 
multiple type. Example type for Job include Submitted, Initied, Finished (and 
others).
# While writing Json data, each record will be a separate line by itself. There 
will not be any new lines within a record.
# Each event class would support a toJSON() method that would serialize the 
event into a JsonNode. Event writers can use this method to write this event in 
the JSON format to the underlying stream. If the event writers want to write to 
a different format, they could choose either to parse this JsonNode object or 
query the Event itself after ascertaining its kind and type.
# Similarly, each Event class would support a constructor that takes a JsonNode 
Object to create an event instance by the event readers while reading from the 
underlying stream.
# Currently, the JobConf object is stored as a separate file, independent of 
the actual JobHistoryFile. We could possibly store the conf contents as a part 
of the history file itself. We could wrap the conf object as a special event 
that is logged during the job submission time.

Here are some illustrative code snippets

{code}

public abstract class HistoryEvent {

  protected String type;
  protected HistoryEventKind kind;
  
  public static enum HistoryEventKind {JOB, TASK, TASKATTEMPT, ...}

  public String getEventType( ) { return type; }
  public HistoryEventKind getEventKind() { return kind; }
  
  public abstract JsonNode toJSON(JsonNodeFactory jnf);
  
  public HistoryEvent(JsonNode node) { }

  public HistoryEvent() {}
}

public abstract class JobHistoryEvent extends HistoryEvent {
  public JobHistoryEvent() { kind = HistoryEventKind.JOB; }
  public JobHistoryEvent(JsonNode node) { kind = HistoryEventKind.JOB;}
}

// An example implementation of the JobSubmittedEvent

public class JobSubmittedEvent extends JobHistoryEvent {

  private JobID jobid;
  private  String jobName;
  private  String userName;
  private  long submitTime;
  private  Path jobConfPath;
  
  public JobSubmittedEvent(JobID id, String jobName, String userName,
      long submitTime, Path jobConfPath) {
    super();
    this.jobid = id;
    this.jobName = jobName;
    this.userName = userName;
    this.submitTime = submitTime;
    this.jobConfPath = jobConfPath;
    type = "SUBMITTED";
  }
  
  public JobID getJobid() { return jobid; }
  public String getJobName() { return jobName; }
  public String getUserName() { return userName; }
  // other getters
  
  public JobSubmittedEvent(JsonNode node) {
    
  // Code to generate event from JsonNode
    
  }

  @Override
  public JsonNode toJSON(JsonNodeFactory jnf) {
    ObjectNode node = new ObjectNode(jnf);
    node.put("EVENT_KIND", kind.toString());
    node.put("EVENT_TYPE", type);
    node.put("JOB_ID", jobid.toString());
    node.put("JOB_NAME", jobName);
    node.put("USER_NAME", userName);
    node.put("SUBMIT_TIME", submitTime);
    node.put("JOB_CONF_PATH", jobConfPath.toString());
    return node;
  }

}

public abstract class HistoryEventWriter {

  public abstract void open(String name);

  public abstract void write(HistoryEvent event) throws IOException;

  public abstract void flush() throws IOException;

  public abstract void close() throws IOException;
}


public abstract class HistoryEventReader {

  public abstract void open(String name) throws IOException;

  public abstract Iterator<HistoryEvent> iterator();

  public abstract void close() throws IOException;

}

public interface HistoryListener {
  public handleHistoryEvent(HistoryEvent event);
}

{code}

> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Jothi Padmanabhan
>
> Currently, parsing the job history logs with external tools is very difficult 
> because of the format. The most critical problem is that newlines aren't 
> escaped in the strings. That makes using tools like grep, sed, and awk very 
> tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to