[
https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896510#action_12896510
]
Amar Kamat commented on MAPREDUCE-2000:
---------------------------------------
Using the regex makes it much cleaner. +1 for using regex.
Few comments.
# Can you please add some comments as to what the regex is supposed to do?
Comments for each of the capturing groups w.r.t what are they planning to
compare/match themselves against would be good enough.
# Can we reuse the regex declared in o.a.h.mapred.JobHistory? Seems similar to
me.
# In the testcase, you could define your values in unescaped format and use
{{StringUtils}} to escape it. This is how the framework does it. So here is how
the testcase might look like
{code}
line-type=// something
key=k1
value=val1 // special char content in unescaped format
line=line-type + space + key + equals + quotes + StringUtils.escape(value) +
quotes + line-delim
ParsedLine pl = new ParsedLine(line, version)
// assert
newValue = pl.get(key)
unEscapeValue = StringUtils.unescape(newValue)
assertEquals(value, unEscapedValue)
{code}
See a sample testcase [here|http://pastebin.com/2Y19v29S].
> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
> Key: MAPREDUCE-2000
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tools/rumen
> Reporter: Hong Tang
> Assignee: Hong Tang
> Attachments: mr-2000-20100806.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does
> not take into account the case when an escaped '"' in the value string. This
> leads to the incorrect parsing the remaining key=value properties in the same
> line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.