[ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753542#action_12753542
 ] 

Sharad Agarwal commented on MAPREDUCE-157:
------------------------------------------

Some comments:
 - Responsibilities of JobHistory and JobHistoryWriter are not clear. I think 
these should be merged in single class and name it say JobHistoryManager
 - Base interface HistoryEvent should be public
 - Getters in event classes should be public
 - HISTORY_VERSION should be 1.0 instead of '0.21'
 - JobHistory#initDone is swallowing the exception
 - Code for localizing of job conf should be moved out of history
 - JobHistory has two logEvents methods. That it confusing. Better would be to 
just have one logEvent method, and no state check in logEvents. JobInProgress 
will explicitly create/close job history writers
 - HistoryCleaner thread should have a check as lastRan != 0 and not lastRan == 
0. Also it should use doneDirFs
 - Using: 
    if (jobName == null || jobName.length() == 0) {
      jobName = "NA";
    }
is not a nice thing. This was perhaps done for some reason in the old format 
but in new one not required. we can have empty string in json
 - file name encoding not required. JobHistoryUtil class can go away
 - TaskInProgress and JobInProgress do a jobHistory != null check everywhere. 
This is not elegant.
 - test case is missed from TestJobHistory#testDoneFolderOnHDFS 
 - EventReader -> better msg in readOneGroup - > throw new 
IOException("Internal error");
 - EventReader -> getHistoryEventType need not be static. It can use the 
instance parser handle
 - History Viewer command line has changed. It takes file name instead of 
output folder. JobClient#displayUsage and commands manual documentation should 
be corrected. 
 - All public APIs including constructors must be documentated






> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>    Affects Versions: 0.20.1
>            Reporter: Owen O'Malley
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.21.0
>
>         Attachments: mapred-157-10Sep.patch, mapred-157-4Sep.patch, 
> mapred-157-7Sep-v1.patch, mapred-157-7Sep.patch, mapred-157-prelim.patch, 
> MAPREDUCE-157-avro.patch
>
>
> Currently, parsing the job history logs with external tools is very difficult 
> because of the format. The most critical problem is that newlines aren't 
> escaped in the strings. That makes using tools like grep, sed, and awk very 
> tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to