On 2018-01-25 07:57, Otto Fowler wrote:
While it would be preferred if all data streamed into the parsers is
already in ‘stream’ form, as opposed to ‘batched’ form, it may not always
be possible, or possible at every step of system development.

I was wondering if it would be worth adding optional support to the JSONMap Parser to support more complex documents, and split them in the parser into
multiple messages. This is similar in function to the JSON Splitter
processor in NiFi

So, a document would come into the JSONMap Parser from Kafka, with some
embedded set of the real message content, such as in this simplified
example:

{
    “messages" : [
        { message1},
        { message2},
        ….
        {messageN}
    ]
}

the JSONMap Parser, would have a new configuration item for message
selection, that would be a JSON Path expression

“messageSelector” : “$.messages “

Inside the JSONMap Parser, it would evaluate the expression, and do the
same processing on each item returned by the expression list.

the Parser interface already supports returning multiple message objects
from a single byte[] input.

There is a performance penalty to be paid here, and it is more than just
doing more than one message due to the JSONPath evaluation.

I can see this being useful in a couple of circumstances:

   -

You want to work with some document format with metron but do not have
   NiFi or the equivalent available or setup yet
   -

   You want to prototype with Metron before you get the ‘preprocessing’
   setup
   -

You are not going to be able to use NiFi and are ok with the performance

I have something in github to look at for more detail :
ottobackwards/json-path-play
<https://github.com/ottobackwards/json-path-play>

Thoughts?

I like this, it's the exact reason why we use NiFi Splitter right now. We get 'batched' CloudTrail events which need to be split in individual events...

Reply via email to