First:  Thanks Casey.

I submitted a review in the PR, that I will not duplicate here.

I would say however the following:

- I would like to understand the problem we are trying to solve with this
more.  This seems like a good idea, and a capability we obviously can
imagine how to implement, but there are things we need to think through.

- While adding metadata “in context” is correct ( the kafka topic to the
parser is in context ), I would like to talk about if some of this activity
is more enrichment than not, and should be handled/exposed there, where we
have the splitter/joiner pattern already.

- Other than exposing the metadata, I am not sure I understand the
difference between this and just adding fields as you currently can.



On June 21, 2017 at 16:24:27, Casey Stella (ceste...@gmail.com) wrote:

Hi All,

I wanted to call attention to a JIRA (METRON-1001) that I just submitted
and possibly discuss it more broader than on the PR.

Currently, we only ingest data in Metron. Often, there is valuable metadata
constructed up-stream of Metron that is relevant to enrichment and
cross-cuts many data formats. Take, for instance, a multi-tenancy case
where multiple sources come in and you'd like to tag the data with the
customer ID. In this case you're stuck finding ways to add the metadata to
each data source's format. Rather than do that, we should allow metadata to
be ingested along with the data associated with it.

In my mind, there are two sources of metadata relevant to support:

- User defined metadata (e.g. customer IDs)
- Environmental metadata (e.g. the actual kafka topic in the case of a
wildcard topic)

I propose the following:

- The parsers allow metadata to be exposed as stellar variables for use
in field transformations
- We use the kafka key to pass user-defined metadata in the form of a
JSON map
- We expose the kafka topic as metadata
- We allow the ability to turn on/off metadata handling
- We allow the ability to turn on/off merging metadata with data if
metadata handling is on
- This be entirely backwards compatible so parsers do not need to change.

I've coded up a reference implementation located at
https://github.com/apache/metron/pull/621 which I will be hacking on in
reaction to this discussion.

Thoughts?

Reply via email to