Regex for Cmd parsing contains an error
---------------------------------------
Key: HADOOP-5087
URL: https://issues.apache.org/jira/browse/HADOOP-5087
Project: Hadoop Core
Issue Type: Bug
Components: contrib/chukwa
Environment: HADOOP-4947 use regex to parse chukwa commands but
there's an error in the regex
the current regex is:
Pattern addCmdPattern =
Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
does not correctly parsed this valid checkpoint entry:
"ADD
org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
Syslog 0 /var/log/messages 114027"
Parsing result:
adaptorName
org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
dataType Syslog
params 0 /var/log/messages 11402
offset 7
Instead of:
adaptorName
org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
dataType Syslog
params 0 /var/log/messages
offset 114027
The correct regex is:
"[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
Example of parsing: "ADD
org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my
param1 param2 /var/log/messages 114027";
Parsing result:
adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
dataType Syslog
params 0 my param1 param2 /var/log/messages
offset 114027
Reporter: Jerome Boulon
Assignee: Jerome Boulon
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.