[ 
https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668557#action_12668557
 ] 

Ari Rabkin commented on HADOOP-5087:
------------------------------------

Comments are good.  It should be easy to split the regex into pieces with 
comments and I'm happy to do it.  But we should decide exactly what the 
behavior we want is, in the case where you have multiple spaces between an 
Adaptor's parameters and the starting offset.  Which spaces belong to the 
parameter, and which are discarded?
That is, suppose have something that looks like:
       add ...FileTailingAdaptor... foo    10
Is the filename "foo" or "foo   " or?

This is basically a matter of taste.  I vote for the former; I think Jerome 
preferrs the latter.  Other opinions?

> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but 
> there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = 
> Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD 
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
>  Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName 
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName 
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: 
> "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD 
> org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my 
> param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to