[
https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896819#action_12896819
]
Hong Tang commented on MAPREDUCE-2000:
--------------------------------------
I uploaded a new patch that addresses Amar's comments.
bq. 1. Can you please add some comments as to what the regex is supposed to
do? Comments for each of the capturing groups w.r.t what are they planning to
compare/match themselves against would be good enough.
Comments added.
bq. 2. Can we reuse the regex declared in o.a.h.mapred.JobHistory? Seems
similar to me.
Yes, they are based on the regex from yhadoop 20. However, this is not
available in trunk. So I have to copy it over.
bq. 3. In the testcase, you could define your values in unescaped format and
use StringUtils to escape it.
Used StringUtils's escapeString and unescapeString directly. I also added a few
more unit tests that should be covering all cases as included in the testcase
you wrote.
> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
> Key: MAPREDUCE-2000
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tools/rumen
> Reporter: Hong Tang
> Assignee: Hong Tang
> Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does
> not take into account the case when an escaped '"' in the value string. This
> leads to the incorrect parsing the remaining key=value properties in the same
> line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.