[ https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743161#action_12743161 ]
Sharad Agarwal commented on MAPREDUCE-157: ------------------------------------------ bq. Is the implicit schema proposed here Map<String,String>? For example, would integer values be written as JSON strings, with quotes, or as JSON integers, without quotes? If the schema is Map<String,String> and will be for all time, then there's less point to using Avro. But if fields are typed it might be nice to record the types in a schema. I think that is a reasonable statement. Also apart from types, we would like to have the nested records not just the key-values. (counter info etc.). So Avro looks good fit to me. bq. We could, for example, make the first line of log files the schema, or write a side file, but there's not much point to Avro data without storing a schema. I think side file would be better as it won't bloat each file with the same info. We can have a union schema comprising of all history events. Perhaps the first line could just be the hadoop version no. as the schema file would be corresponding to it. > Job History log file format is not friendly for external tools. > --------------------------------------------------------------- > > Key: MAPREDUCE-157 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-157 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Reporter: Owen O'Malley > Assignee: Jothi Padmanabhan > > Currently, parsing the job history logs with external tools is very difficult > because of the format. The most critical problem is that newlines aren't > escaped in the strings. That makes using tools like grep, sed, and awk very > tricky. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.