[
https://issues.apache.org/jira/browse/HADOOP-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668143#action_12668143
]
Jerome Boulon commented on HADOOP-5087:
---------------------------------------
The definition for the ADD command is:
// words should contain (space delimited):
// 0) command ("add")
// 1) AdaptorClassname
// 2) dataType (e.g. "hadoop_log")
// 3) params <optional>
// (e.g. for files, this is filename,
// but can be arbitrarily many space
// delimited agent specific params )
// 4) offset
How can you remove trailing spaces from adaptor parameters, this is adator
specific and the adaptor should take care of that and this should not be
automatically by the processCommand: HADOOP-5087-2.patch is doing that
Current tests cases are failing for 2 reasons:
-> space on the filename and the adaptor should be fixed
-> A test case send some chunks to the queue but do not clean up after itself
and the shutdown method on the agent is not doing any sort of cleanup since in
the real world the agent is calling System.exit(0). The solution is to move
that test in a separate test case. Since we are forking, it will be fine.
> Regex for Cmd parsing contains an error
> ---------------------------------------
>
> Key: HADOOP-5087
> URL: https://issues.apache.org/jira/browse/HADOOP-5087
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/chukwa
> Environment: HADOOP-4947 use regex to parse chukwa commands but
> there's an error in the regex
> the current regex is:
> Pattern addCmdPattern =
> Pattern.compile("[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\S)?\\s*(\\d+)\\s*");
> does not correctly parsed this valid checkpoint entry:
> "ADD
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> Syslog 0 /var/log/messages 114027"
> Parsing result:
> adaptorName
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages 11402
> offset 7
> Instead of:
> adaptorName
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8NewLineEscaped
> dataType Syslog
> params 0 /var/log/messages
> offset 114027
> The correct regex is:
> "[aA][dD][dD]\\s+(\\S+)\\s+(\\S+)\\s+(.*\\s)?\\s*(\\d+)\\s*"
> Example of parsing: "ADD
> org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor Syslog 0 my
> param1 param2 /var/log/messages 114027";
> Parsing result:
> adaptorName org.apache.hadoop.chukwa.datacollection.adaptor.MySpecificAdaptor
> dataType Syslog
> params 0 my param1 param2 /var/log/messages
> offset 114027
> Reporter: Jerome Boulon
> Assignee: Jerome Boulon
> Attachments: fixedregex.patch, HADOOP-5087-2.patch, HADOOP-5087.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.