[DISCUSS] Metadata Ingest

Casey Stella Wed, 21 Jun 2017 13:24:35 -0700

Hi All,

I wanted to call attention to a JIRA (METRON-1001) that I just submitted
and possibly discuss it more broader than on the PR.


Currently, we only ingest data in Metron. Often, there is valuable metadata
constructed up-stream of Metron that is relevant to enrichment and
cross-cuts many data formats. Take, for instance, a multi-tenancy case
where multiple sources come in and you'd like to tag the data with the
customer ID. In this case you're stuck finding ways to add the metadata to
each data source's format. Rather than do that, we should allow metadata to
be ingested along with the data associated with it.

In my mind, there are two sources of metadata relevant to support:

   - User defined metadata (e.g. customer IDs)
   - Environmental metadata (e.g. the actual kafka topic in the case of a
   wildcard topic)

I propose the following:

   - The parsers allow metadata to be exposed as stellar variables for use
   in field transformations
   - We use the kafka key to pass user-defined metadata in the form of a
   JSON map
   - We expose the kafka topic as metadata
   - We allow the ability to turn on/off metadata handling
   - We allow the ability to turn on/off merging metadata with data if
   metadata handling is on
   - This be entirely backwards compatible so parsers do not need to change.

I've coded up a reference implementation located at
https://github.com/apache/metron/pull/621 which I will be hacking on in
reaction to this discussion.

Thoughts?

[DISCUSS] Metadata Ingest

Reply via email to