[jira] Commented: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Amar Kamat (JIRA) Mon, 09 Aug 2010 04:39:44 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896510#action_12896510
 ]


Amar Kamat commented on MAPREDUCE-2000:
---------------------------------------

Using the regex makes it much cleaner. +1 for using regex. 

Few comments.
# Can you please add some comments as to what the regex is supposed to do? 
Comments for each of the capturing groups w.r.t what are they planning to 
compare/match themselves against would be good enough.
# Can we reuse the regex declared in o.a.h.mapred.JobHistory? Seems similar to 
me.
# In the testcase, you could define your values in unescaped format and use 
{{StringUtils}} to escape it. This is how the framework does it. So here is how 
the testcase might look like

{code}
line-type=// something
key=k1
value=val1 // special char content in unescaped format
line=line-type + space + key + equals + quotes + StringUtils.escape(value) + 
quotes + line-delim
ParsedLine pl = new ParsedLine(line, version)

// assert
newValue = pl.get(key)
unEscapeValue = StringUtils.unescape(newValue)
assertEquals(value, unEscapedValue)
{code}

See a sample testcase [here|http://pastebin.com/2Y19v29S].

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does 
> not take into account the case when an escaped '"' in the value string. This 
> leads to the incorrect parsing the remaining key=value properties in the same 
> line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2000) Rumen is not able to extract counters for Job history logs from Hadoop 0.20

Reply via email to