[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896819#action_12896819
 ] 

Hong Tang commented on MAPREDUCE-2000:
--------------------------------------

I uploaded a new patch that addresses Amar's comments.

bq.  1.  Can you please add some comments as to what the regex is supposed to 
do? Comments for each of the capturing groups w.r.t what are they planning to 
compare/match themselves against would be good enough.
Comments added.

bq.   2. Can we reuse the regex declared in o.a.h.mapred.JobHistory? Seems 
similar to me.
Yes, they are based on the regex from yhadoop 20. However, this is not 
available in trunk. So I have to copy it over.

bq.   3. In the testcase, you could define your values in unescaped format and 
use StringUtils to escape it. 
Used StringUtils's escapeString and unescapeString directly. I also added a few 
more unit tests that should be covering all cases as included in the testcase 
you wrote.

> Rumen is not able to extract counters for Job history logs from Hadoop 0.20
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2000
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2000
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Hong Tang
>            Assignee: Hong Tang
>         Attachments: mr-2000-20100806.patch, mr-2000-20100809.patch
>
>
> Rumen tries to match the end of a value string through indexOf("\""). It does 
> not take into account the case when an escaped '"' in the value string. This 
> leads to the incorrect parsing the remaining key=value properties in the same 
> line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to