Github user jagdeepsingh2 commented on a diff in the pull request:
https://github.com/apache/metron/pull/1245#discussion_r237699908
--- Diff: metron-platform/metron-parsers/README.md ---
@@ -52,6 +52,62 @@ There are two general types types of parsers:
This is using the default value for `wrapEntityName` if that
property is not set.
* `wrapEntityName` : Sets the name to use when wrapping JSON using
`wrapInEntityArray`. The `jsonpQuery` should reference this name.
* A field called `timestamp` is expected to exist and, if it does not,
then current time is inserted.
+ * Regular Expressions Parser
+ * `recordTypeRegex` : A regular expression to uniquely identify a
record type.
+ * `messageHeaderRegex` : A regular expression used to extract fields
from a message part which is common across all the messages.
+ * `convertCamelCaseToUnderScore` : If this property is set to true,
this parser will automatically convert all the camel case property names to
underscore seperated.
+ For example, following convertions will automatically happen:
+
+ ```
+ ipSrcAddr -> ip_src_addr
+ ipDstAddr -> ip_dst_addr
+ ipSrcPort -> ip_src_port
+ ```
+ Note this property may be necessary, because java does not
support underscores in the named group names. So in case your property naming
conventions requires underscores in property names, use this property.
+
+ * `fields` : A json list of maps contaning a record type to regular
expression mapping.
+
+ A complete configuration example would look like:
+
+ ```json
+ "convertCamelCaseToUnderScore": true,
+ "recordTypeRegex": "kernel|syslog",
+ "messageHeaderRegex":
"(<syslogPriority>(<=^<)\\d{1,4}(?=>)).*?(<timestamp>(<=>)[A-Za-z]
{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(<syslogHost>(<=\\s).*?(?=\\s))",
+ "fields": [
+ {
+ "recordType": "kernel",
+ "regex": ".*(<eventInfo>(<=\\]|\\w\\:).*?(?=$))"
+ },
+ {
+ "recordType": "syslog",
+ "regex":
".*(<processid>(<=PID\\s=\\s).*?(?=\\sLine)).*(<filePath>(<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))
(<fileName>.*?(?=\")).*(<eventInfo>(<=\").*?(?=$))"
+ }
+ ]
+ ```
+ **Note**: messageHeaderRegex and regex (withing fields) could be
specified as lists also e.g.
--- End diff --
Following is an example where regex is a list:
{```
"recordType": "STARTSAVECONFIG",
"regex": [
".*(?<deviceName>(?<=\\s).*?(?=\\s\\d{1,7}-\\w{1,10}-\\d{1,7})).*?(?<eventInfo>(?
<=\\s\\d{1,7}\\s:\\s).*?(?=$)).*$",
".*(?<deviceName>(?<=\\s).*?(?=\\s\\d{1,7}-\\w{1,10}-\\d{1,7})).*?(?<eventInfo>(?<=\\s:\\s).*?(?=$)).*$"
]
}
```
A list should be chosen when there are multiple forms of a particular
record type.
If there is only one form of a record type (for example in case of Cisco
ASA), then there is no need to have a list. **regex** field can be specified
in a string as only a single regular expression is required per **recordType**.
For example
```
{
"recordType": "APPFW APPFW_FIELDFORMAT",
"regex":
".*(?<deviceName>(?<=\\s).*?(?=\\s\\d{1,7}-\\w{1,10}-\\d{1,7})).*?(?<ipSrcAddr>(?<=\\s\\d{1,7}\\s:\\s{1,2}).*?(?=\\s)).*?(?<ipSrcPort>(?<=\\s)\\d+(?=\\-)).*?(?<path>(?<=\\-\\w{1,10}\\s).*?(?=\\s)).*?(?<status>(?<=\\s).*?(?=\\s)).*?(?<requestUri>(?<=\\s).*?(?=\\s)).*?(?<eventInfo>(?<=\\s).*?(?=\\s\\<)).*?(?<responseResultString>(?<=\\<).*?(?=\\>)).*$"
}
```
---