[
https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668547#action_12668547
]
Mac Yang commented on HADOOP-5087:
----------------------------------
Regex is very powerful and could provide an elegant solution to the right
problem. However, it's not the easiest thing to read and maintain.
A typical answer to regex maintainability issue is to have detailed comment on
the regex. O'Reilly has an article on how to maintain regex which I thought was
quite useful (http://www.perl.com/pub/a/2004/01/16/regexps.html). I think we
should do something like that if we want to take the regex approach.
> Regex for Cmd parsing contains an error
> ---------------------------------------
>
> Key: HADOOP-5087
> URL: https://issues.apache.org/jira/browse/HADOOP-5087
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/chukwa
> Environment: HADOOP-4947 use regex to parse chukwa commands but
> there's an error in the regex
> the current regex is:
> Pattern addCmdPattern =
> Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages
> offset 114027
> The correct regex is:
> "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD
> org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my
> param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages
> offset 114027
> Reporter: Jerome Boulon
> Assignee: Jerome Boulon
> Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.