[ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743161#action_12743161
 ] 

Sharad Agarwal commented on MAPREDUCE-157:
------------------------------------------

bq. Is the implicit schema proposed here Map<String,String>? For example, would 
integer values be written as JSON strings, with quotes, or as JSON integers, 
without quotes? If the schema is Map<String,String> and will be for all time, 
then there's less point to using Avro. But if fields are typed it might be nice 
to record the types in a schema.
I think that is a reasonable statement. Also apart from types, we would like to 
have the nested records not just the key-values. (counter info etc.). So Avro 
looks good fit to me. 

bq. We could, for example, make the first line of log files the schema, or 
write a side file, but there's not much point to Avro data without storing a 
schema.
I think side file would be better as it won't bloat each file with the same 
info. We can have a union schema comprising of all history events. Perhaps the 
first line could just be the hadoop version no. as the schema file would be 
corresponding to it.


> Job History log file format is not friendly for external tools.
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>            Reporter: Owen O'Malley
>            Assignee: Jothi Padmanabhan
>
> Currently, parsing the job history logs with external tools is very difficult 
> because of the format. The most critical problem is that newlines aren't 
> escaped in the strings. That makes using tools like grep, sed, and awk very 
> tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to