[
https://issues.apache.org/jira/browse/HADOOP-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618344#action_12618344
]
Amar Kamat commented on HADOOP-2403:
------------------------------------
Following are the solutions that might work here
1) Encode/decode the strings that are externally generated (like error).
2) Add character count information to the key-val pair. Something like
{{key=length:value}}. Now read _length_ characters at a time for forming the
value.
3) Serialize job history information.
Following are the problems with {{(2)}} when error-like strings are stored in
{{JobHistory}} :
1) The whole line needs to be read before parsing and hence there is no good
way to detect the length at the key-value level.
2) A simple solution would be to use _record-level_ length. But again there is
a problem :
- Currently, the code checks for a ' " ' in the end before considering the
record as complete. This can be erroneous as error string can contain ' " '
which might lead to premature termination. Also the splitting of _key-val_
pairs is done based on _space_. Hence with _error_ like strings in the history,
the split will result into wrong key-val pairs. Hence _encoding-decoding_ seems
to a better fix for all these problems.
While encoding, one should make sure that the data is written in one line.
Hence the record parsing algorithm becomes
1) Read line. Since all the entries fit in one line, there is no need to look
for record end.
2) Split the line based on _space_.
3) Split the pair on '=' to get key and value. Recover the value if required.
----
Thoughts?
> JobHistory log files contain data that cannot be parsed by
> org.apache.hadoop.mapred.JobHistory
> ----------------------------------------------------------------------------------------------
>
> Key: HADOOP-2403
> URL: https://issues.apache.org/jira/browse/HADOOP-2403
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: Runping Qi
> Priority: Critical
>
> When some tasks failed, the job tracker writes an line to the history file
> with error message.
> However, the error message may mess up with the history file format, choking
> the history parser. Here is an example:
> MapAttempt TASK_TYPE="MAP" TASKID="tip_200712102254_0001_m_000090"
> TASK_ATTEMPT_ID="task_200712102254_0001_m_000090_0" TASK_STATUS="FAILED"
> FINISH_TIME="1197327293253" HOSTNAME="XXXX:50050"
> ERROR="java.lang.IllegalArgumentException: Trouble to get key or value (<,>
> substituted by null
> . Key XML-Ori:
> <Root>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.