Hi All, I wanted to call attention to a JIRA (METRON-1001) that I just submitted and possibly discuss it more broader than on the PR.
Currently, we only ingest data in Metron. Often, there is valuable metadata constructed up-stream of Metron that is relevant to enrichment and cross-cuts many data formats. Take, for instance, a multi-tenancy case where multiple sources come in and you'd like to tag the data with the customer ID. In this case you're stuck finding ways to add the metadata to each data source's format. Rather than do that, we should allow metadata to be ingested along with the data associated with it. In my mind, there are two sources of metadata relevant to support: - User defined metadata (e.g. customer IDs) - Environmental metadata (e.g. the actual kafka topic in the case of a wildcard topic) I propose the following: - The parsers allow metadata to be exposed as stellar variables for use in field transformations - We use the kafka key to pass user-defined metadata in the form of a JSON map - We expose the kafka topic as metadata - We allow the ability to turn on/off metadata handling - We allow the ability to turn on/off merging metadata with data if metadata handling is on - This be entirely backwards compatible so parsers do not need to change. I've coded up a reference implementation located at https://github.com/apache/metron/pull/621 which I will be hacking on in reaction to this discussion. Thoughts?