Absolutely. I would agree with that as an approach. I would also suggest we discuss where schemas and versions should be stored. Atlas? The NiFi schema repo abstraction (which limits us to Avro to express schema).
What I would like to see would be a change to parser interfaces that emits field types, ditto the enrichment stages, and then detect changes from that. The other issue to consider is forward and back compatibility on versions. For example, if we want to output ORC schema (I really think we should, because the current JSON on HDFS format is huge and slow), we need to consider the schema output history, since ORC will allow scheme evolution to an extent (adding fields) but not to others (removing or reordering fields). This can be resolved by sensible versioning and history aware schema generation. Simon On 22 May 2018 at 15:23, Otto Fowler <ottobackwa...@gmail.com> wrote: > Yes Simon, when I say ‘whatever we would call the complete parse/enrich > path’ that is what I was referring to. > > I would think the flow would be: > > Save or deploy sensor configurations > -> check if there is a difference in the configurations from last to new > version > -> if there is a difference that effects the ‘schema’ in any configuration > -> build master schema from configurations > -> version, store, deploy > > or something. I’m sure there are things about clean slate deploy vs. new > version deploy. > > On May 22, 2018 at 09:59:06, Simon Elliston Ball ( > si...@simonellistonball.com) wrote: > > What I would really like to see is not a full end-to-end schema, but units > that contribute schema. I don't want to see a parser, enrichment, indexing > config as one package because in any given deployment for any given > sensor, > I may have a different set of enrichments, and so need a different output > template. > > What I would propose would be parsers and enrichments contribute partial > schema (potentially expressed as avro, but the important thing is just a > map of fields to types) which can then be composed, and have the metron > platform handle creating ES templates / solr schema / Hive Hcat schema / > A.N.Other index's schema meta data as the composite of those pieces. So, a > parser would contribute a set of fields, the fieldTransformations on the > sensor would contribute some fields, and each enrichment block would > contribute some fields, at which point we have enough schema definition to > generate all the required artefacts for whatever storage it ends up in. > > Essentially, composable partial schema units from each component, which > add > up at the end. > > Does that make sense? > > Simon > > > On 22 May 2018 at 14:10, Otto Fowler <ottobackwa...@gmail.com> wrote: > > > We have discussed in the past as part of 777 ( moment of silence…. ) the > > idea that parsers/sensors ( or whatever we would call the complete > > parse/enrich path ) could define a their ES or Solr schemas so that > > they can be ‘installed’ as part of metron and remove the requirement for > a > > separate install by the system or by the user of a specific index > template > > or equivalent. > > > > Nifi has settled on Avro schemas to describe their ‘record’ based data, > and > > it makes me wonder if we might want to think of using Avro as a > universal > > schema or the base for one such that we can define a schema and apply it > to > > either ES or Solr. > > > > Thoughts? > > > > > > -- > -- > simon elliston ball > @sireb > > -- -- simon elliston ball @sireb