+1, I look forward to the PR.

On Tue, Aug 28, 2018 at 8:37 AM Nick Allen <n...@nickallen.org> wrote:

> I'd love to see a PR for this.  I know there are others in the community
> looking for something similar.
>
> On Sun, Aug 26, 2018 at 7:28 PM <jskar...@gmail.com> wrote:
>
> > Hello,
> >
> >
> >
> > We have implemented a general purpose regex parser for Metron that we are
> > interested in contributing back to the community.
> >
> >
> >
> > While the Metron Grok parser provides some regex based capability today,
> > the intention of this general purpose regex parser is to:
> >
> >    1. Allow for more advanced parsing scenarios (specifically, dealing
> with
> >    multiple regex lines for devices that contain several log formats
> within
> >    them)
> >    2. Give users and developers of Metron additional options for parsing
> >    3. With the new parser chaining and regex routing feature available in
> >    Metron, this gives some additional flexibility to logically separate a
> > flow
> >    by:
> >       1. Regex routing to segregate logs at a device level and handle
> >       envelope unwrapping
> >       2. This general purpose regex parser to parse an entire device type
> >       that contains multiple log formats within the single device (for
> > example,
> >       RHEL logs)
> >
> >
> >
> >  At  a high level control flow is like this:
> >
> > 1. Identify the record type if incoming raw message.
> >
> > 2. Find and apply the regular expression of corresponding record type to
> > extract the fields (using named groups).
> >
> > 3. Apply the message header regex to extract the fields in the header
> part
> > of the message (using named groups).
> >
> >
> > The parser config uses the following structure:
> >
> >    "recordTypeRegex":
> "(?<process>(?<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
> >
> >    "messageHeaderRegex": "(?<syslogpriority>(?<=^<)
> >
> >
> \\d{1,4}(?=>)).*?(?<timestamp>(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?<syslogHost>(?<=\\s).*?(?=\\s))
> > ",
> >
> >    "fields": [
> >
> >       {
> >
> >         "recordType": "kernel",
> >
> >         "regex": ".*(?<eventInfo>(?<=\\]|\\w\\:).*?(?=$))"
> >
> >       },
> >
> >       {
> >
> >         "recordType": "syslog",
> >
> >         "regex":
> >
> >
> ".*(?<processid>(?<=PID\\s=\\s).*?(?=\\sLine)).*(?<filePath>(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?<fileName>.*?(?=\")).*(?<eventInfo>(?<=\").*?(?=$))"
> >
> >       }
> >
> > ]
> >
> >
> >
> > Where:
> >
> >    - recordTypeRegex is used to distinctly identify a record type. It
> >    inputs a valid regular expression and may also have named groups,
> which
> >    would be extracted into fields.
> >    - messageHeaderRegex is used to specify a regular expression to
> extract
> >    fields from a message part which is common across all the messages
> (i.e,
> >    syslog fields, standard headers)
> >    - fields: json list of objects containing recordType and regex. The
> >    expression that is evaluated is based on the output of the
> > recordTypeRegex
> >    - Note: recordTypeRegex and messageHeaderRegex could be specified as
> >    lists also (as a JSON array), where the list will be evaluated in
> order
> >    until a matching regular expression is found.
> >
> >
> >
> >
> >
> > If there are no objections to having this type of Parser within Metron,
> we
> > will open a JIRA/PR for code review.
> >
> > *Jagdeep Singh*
> >
>

Reply via email to