When adding the capability for parsing messages in the JsonMapParser using
JSON Path expressions the original behavior for managing original strings
was changed.

https://github.com/apache/metron/blob/master/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/json/JSONMapParser.java#L192

A couple issues have been reported recently regarding this change:

   1. We're losing the actual original string, which is a legal issue for
   data lineage for some customers
   2. Even for the degenerate case with no sub-messages created, the
   original sub-message string is modified because of the
   serialization/deserialization process with Jackson/JsonSimple. The fields
   are reordered bc the content is normalized.

I looked at options for preserving formatting, but am unable to find a
method that allows you to both parse, then query the original message and
then also obtain the raw string matches without the normalizing from
ser/deserialization.

I'd like to propose that we add a configuration option for this parser that
allows the user to toggle which approach they'd like to use. My personal
preference based on feedback I've gotten from multiple customers is that
the default should be the older approach which takes the raw original
string. It's arguable that this change in contract is a regression, so the
default should be the earlier behavior. Any sub-messages would then have a
copy of that raw original string, not just the sub-message original string.
Enabling the flag would enable the current sub-message original string
functionality.

Mike

Reply via email to