Re: [DISCUSS] Metadata Ingest

Simon Elliston Ball Wed, 21 Jun 2017 20:50:46 -0700

I really like this idea. A good use case I imagine would be to have something 
like asa data, tagged with some custom meta data (e.g. Tenant ID in a 
multi-tenant install) but not have to mess with the actual parser. To that 
extent it makes sense to expose said meta data via stellar so users can decide 
how to incorporate it into a metron object.


That said, I think we should lay down some principles or conventions on the 
expected form of meta data, as we do with the main data fields to ensure some 
consistency across implementations, or at least get people started.

I also think we should add the functionality to our reference application docs 
showing how to set meta data in keys in NiFi. This approach certainly 
complements the way NiFi tags and thinks about meta data well, and that would 
be worth highlighting in the example.

Simon  

> On 22 Jun 2017, at 04:25, Otto Fowler <[email protected]> wrote:
> 
> First:  Thanks Casey.
> 
> I submitted a review in the PR, that I will not duplicate here.
> 
> I would say however the following:
> 
> - I would like to understand the problem we are trying to solve with this
> more.  This seems like a good idea, and a capability we obviously can
> imagine how to implement, but there are things we need to think through.
> 
> - While adding metadata “in context” is correct ( the kafka topic to the
> parser is in context ), I would like to talk about if some of this activity
> is more enrichment than not, and should be handled/exposed there, where we
> have the splitter/joiner pattern already.
> 
> - Other than exposing the metadata, I am not sure I understand the
> difference between this and just adding fields as you currently can.
> 
> 
> 
> On June 21, 2017 at 16:24:27, Casey Stella ([email protected]) wrote:
> 
> Hi All,
> 
> I wanted to call attention to a JIRA (METRON-1001) that I just submitted
> and possibly discuss it more broader than on the PR.
> 
> Currently, we only ingest data in Metron. Often, there is valuable metadata
> constructed up-stream of Metron that is relevant to enrichment and
> cross-cuts many data formats. Take, for instance, a multi-tenancy case
> where multiple sources come in and you'd like to tag the data with the
> customer ID. In this case you're stuck finding ways to add the metadata to
> each data source's format. Rather than do that, we should allow metadata to
> be ingested along with the data associated with it.
> 
> In my mind, there are two sources of metadata relevant to support:
> 
> - User defined metadata (e.g. customer IDs)
> - Environmental metadata (e.g. the actual kafka topic in the case of a
> wildcard topic)
> 
> I propose the following:
> 
> - The parsers allow metadata to be exposed as stellar variables for use
> in field transformations
> - We use the kafka key to pass user-defined metadata in the form of a
> JSON map
> - We expose the kafka topic as metadata
> - We allow the ability to turn on/off metadata handling
> - We allow the ability to turn on/off merging metadata with data if
> metadata handling is on
> - This be entirely backwards compatible so parsers do not need to change.
> 
> I've coded up a reference implementation located at
> https://github.com/apache/metron/pull/621 which I will be hacking on in
> reaction to this discussion.
> 
> Thoughts?

Re: [DISCUSS] Metadata Ingest

Reply via email to