I think the chaining of parsers, or ability to compose parsers is a good
idea, but with reference to the pr mentioned, I would have some number of
StellarChainLinks as opposed re-implementing stellar in chainlinks.
Although it is NiFi-y.  But since I write Processors too, that is fine.


On March 20, 2018 at 18:05:12, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

It seems like parser chaining is becomes a hot topic on the repo too with
https://github.com/apache/metron/pull/969#partial-pull-merging <
https://github.com/apache/metron/pull/969#partial-pull-merging>

I would like to discuss the option, and how we might architect, of
configuring parsers to operate on the output of parsers. This may also give
us the opportunity to be more efficient in scenarios where people have
large numbers of sources, and so use up a lot of slots for lower volume
parsers for example.

I have a bunch of ideas around this, but am more keen to hear what everyone
else thinks at this stage. How should we go about fixing parser config so
that it’s clearer (removing the need for people to reinvent the parser
wheel as we’ve seen in a few places) and also more concise and powerful
(consolidating the parsing of transports such as syslog and content such as
application logs, or types of device logs).

If this can lead to a more efficient way of handling both the syslog
problem, and the kind of problem that leads to switching between grok
statements in something like our ASA parser then all the better. I suspect
that there might also be a case for multi-level chaining here too, since
some things are embedded in multiple transports, or might have complex
fields that want ‘sub-parsing’.

Of course one of the key values of Metron is its speed, so maybe
formalising some of the microbenchmarking approaches a few of us have been
working on might help here too. I’ve got a few bits of micro-benching
infrastructure around CEF and ASA, and I believe there’s also been some
work to load and perf test things like enrichment that might be leveraged.

Thoughts on a dev board?

Simon

> On 20 Mar 2018, at 21:47, Otto Fowler <ottobackwa...@gmail.com> wrote:
>
> I entered METRON–1453 <https://issues.apache.org/jira/browse/METRON-1453>
a
> little while ago while working on the PR#579
> <https://github.com/apache/metron/pull/579>.
>
> "We have several parsers now, with many imaginable that are based on
> syslog, where the format is SYSLOG HEADER MESSAGE.
>
> With message being in a different format. It would be great is we had a
way
> to generically handle syslog headers, such that ANY parser data could
come
> over syslog.
>
> Either you could have a custom parser, or configure CSV or JSON such that
> they could be the payload, such that you can handle JSON over syslog by
> configuration only."
>
> The idea would be that the parser bolt would use the configuration to
> trigger parsing the incoming message as syslog formatted, and pass the
> message part to the parser, and put the syslog parts in the message(s)
> after parsing.
>
> As part of this I did some work on parsing syslog, using both grok and a
> DSL that I did from the spec :
https://github.com/ottobackwards/grok-v-antlr
>
> The DSL is slower, but grok cannot handle multiple structured data
entries,
> and the DSL can. I’m not good enough at grok to fix it so that it is
> functionally equivalent. Another option would be to write a third parser…
> It is also possible that the DSL could be improved for speed of course.
>
> Thoughts?

Reply via email to