[ 
https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668529#action_12668529
 ] 

Jerome Boulon commented on HADOOP-5087:
---------------------------------------

The idea of HADOOP-4947 was to have a more flexible parsing for chukwa commands.
Moving to regex was a good idea but the current regex to match the previous 
parsing (6-7 simple statements) seems to be very complicated and will be 
difficult to extend in the future.

So, I'm asking if in order to keep it simple, shouldn't we revert back to 
something similar to the initial parsing?





> Regex for Cmd parsing contains an error
> ---------------------------------------
>
>                 Key: HADOOP-5087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/chukwa
>         Environment: HADOOP-4947 use regex to parse chukwa commands but 
> there's an error in the regex
> the current regex is:
> Pattern addCmdPattern = 
> Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD 
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
>  Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName 
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName 
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 
> offset 114027
> The correct regex is: 
> "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD 
> org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my 
> param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages 
> offset 114027
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to