+1, I look forward to the PR. On Tue, Aug 28, 2018 at 8:37 AM Nick Allen <n...@nickallen.org> wrote:
> I'd love to see a PR for this. I know there are others in the community > looking for something similar. > > On Sun, Aug 26, 2018 at 7:28 PM <jskar...@gmail.com> wrote: > > > Hello, > > > > > > > > We have implemented a general purpose regex parser for Metron that we are > > interested in contributing back to the community. > > > > > > > > While the Metron Grok parser provides some regex based capability today, > > the intention of this general purpose regex parser is to: > > > > 1. Allow for more advanced parsing scenarios (specifically, dealing > with > > multiple regex lines for devices that contain several log formats > within > > them) > > 2. Give users and developers of Metron additional options for parsing > > 3. With the new parser chaining and regex routing feature available in > > Metron, this gives some additional flexibility to logically separate a > > flow > > by: > > 1. Regex routing to segregate logs at a device level and handle > > envelope unwrapping > > 2. This general purpose regex parser to parse an entire device type > > that contains multiple log formats within the single device (for > > example, > > RHEL logs) > > > > > > > > At a high level control flow is like this: > > > > 1. Identify the record type if incoming raw message. > > > > 2. Find and apply the regular expression of corresponding record type to > > extract the fields (using named groups). > > > > 3. Apply the message header regex to extract the fields in the header > part > > of the message (using named groups). > > > > > > The parser config uses the following structure: > > > > "recordTypeRegex": > "(?<process>(?<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))" > > > > "messageHeaderRegex": "(?<syslogpriority>(?<=^<) > > > > > \\d{1,4}(?=>)).*?(?<timestamp>(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?<syslogHost>(?<=\\s).*?(?=\\s)) > > ", > > > > "fields": [ > > > > { > > > > "recordType": "kernel", > > > > "regex": ".*(?<eventInfo>(?<=\\]|\\w\\:).*?(?=$))" > > > > }, > > > > { > > > > "recordType": "syslog", > > > > "regex": > > > > > ".*(?<processid>(?<=PID\\s=\\s).*?(?=\\sLine)).*(?<filePath>(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?<fileName>.*?(?=\")).*(?<eventInfo>(?<=\").*?(?=$))" > > > > } > > > > ] > > > > > > > > Where: > > > > - recordTypeRegex is used to distinctly identify a record type. It > > inputs a valid regular expression and may also have named groups, > which > > would be extracted into fields. > > - messageHeaderRegex is used to specify a regular expression to > extract > > fields from a message part which is common across all the messages > (i.e, > > syslog fields, standard headers) > > - fields: json list of objects containing recordType and regex. The > > expression that is evaluated is based on the output of the > > recordTypeRegex > > - Note: recordTypeRegex and messageHeaderRegex could be specified as > > lists also (as a JSON array), where the list will be evaluated in > order > > until a matching regular expression is found. > > > > > > > > > > > > If there are no objections to having this type of Parser within Metron, > we > > will open a JIRA/PR for code review. > > > > *Jagdeep Singh* > > >