Regex for Cmd parsing contains an error
---------------------------------------

                 Key: HADOOP-5087
                 URL: https://issues.apache.org/jira/browse/HADOOP-5087
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/chukwa
         Environment: HADOOP-4947 use regex to parse chukwa commands but 
there's an error in the regex
the current regex is:
Pattern addCmdPattern = 
Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
does not correctly parsed this valid checkpoint entry:
"ADD 
org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
 Syslog 0 /var/log/messages 114027"
Parsing result:
adaptorName 
org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
dataType Syslog
params 0 /var/log/messages 11402
offset 7

Instead of:
adaptorName 
org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
dataType Syslog
params 0 /var/log/messages 
offset 114027

The correct regex is: 
"[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
Example of parsing: "ADD 
org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my 
param1 param2 /var/log/messages 114027";
Parsing result:
adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
dataType Syslog
params 0 my param1 param2 /var/log/messages 
offset 114027


            Reporter: Jerome Boulon
            Assignee: Jerome Boulon




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to