Re: [DISCUSS] Generic Syslog Parsing capability for parsers

Simon Elliston Ball Tue, 20 Mar 2018 15:05:29 -0700

It seems like parser chaining is becomes a hot topic on the repo too with 
https://github.com/apache/metron/pull/969#partial-pull-merging 
<https://github.com/apache/metron/pull/969#partial-pull-merging>

I would like to discuss the option, and how we might architect, of configuring 
parsers to operate on the output of parsers. This may also give us the 
opportunity to be more efficient in scenarios where people have large numbers 
of sources, and so use up a lot of slots for lower volume parsers for example.

I have a bunch of ideas around this, but am more keen to hear what everyone 
else thinks at this stage. How should we go about fixing parser config so that 
it’s clearer (removing the need for people to reinvent the parser wheel as 
we’ve seen in a few places) and also more concise and powerful (consolidating 
the parsing of transports such as syslog and content such as application logs, 
or types of device logs). 

If this can lead to a more efficient way of handling both the syslog problem, 
and the kind of problem that leads to switching between grok statements in 
something like our ASA parser then all the better. I suspect that there might 
also be a case for multi-level chaining here too, since some things are 
embedded in multiple transports, or might have complex fields that want 
‘sub-parsing’.

Of course one of the key values of Metron is its speed, so maybe formalising 
some of the microbenchmarking approaches a few of us have been working on might 
help here too. I’ve got a few bits of micro-benching infrastructure around CEF 
and ASA, and I believe there’s also been some work to load and perf test things 
like enrichment that might be leveraged.

Thoughts on a dev board? 

Simon

> On 20 Mar 2018, at 21:47, Otto Fowler <[email protected]> wrote:
> 
> I entered METRON–1453 <https://issues.apache.org/jira/browse/METRON-1453> a
> little while ago while working on the PR#579
> <https://github.com/apache/metron/pull/579>.
> 
> "We have several parsers now, with many imaginable that are based on
> syslog, where the format is SYSLOG HEADER MESSAGE.
> 
> With message being in a different format. It would be great is we had a way
> to generically handle syslog headers, such that ANY parser data could come
> over syslog.
> 
> Either you could have a custom parser, or configure CSV or JSON such that
> they could be the payload, such that you can handle JSON over syslog by
> configuration only."
> 
> The idea would be that the parser bolt would use the configuration to
> trigger parsing the incoming message as syslog formatted, and pass the
> message part to the parser, and put the syslog parts in the message(s)
> after parsing.
> 
> As part of this I did some work on parsing syslog, using both grok and a
> DSL that I did from the spec : https://github.com/ottobackwards/grok-v-antlr
> 
> The DSL is slower, but grok cannot handle multiple structured data entries,
> and the DSL can. I’m not good enough at grok to fix it so that it is
> functionally equivalent. Another option would be to write a third parser…
> It is also possible that the DSL could be improved for speed of course.
> 
> Thoughts?

Re: [DISCUSS] Generic Syslog Parsing capability for parsers

Reply via email to