I like the flexibility and expressibility of the first option with Stellar filters.
M On Thu, Jan 12, 2017 at 1:51 PM, Casey Stella <[email protected]> wrote: > As of METRON-652 <https://github.com/apache/incubator-metron/pull/415>, we > will have decoupled the indexing configuration from the enrichment > configuration. As an immediate follow-up to that, I'd like to provide the > ability to turn off and on writers via the configs. I'd like to get some > community feedback on how the functionality should work, if y'all are > amenable. :) > > > As of now, we have 3 possible writers which can be used in the indexing > topology: > > - Solr > - Elasticsearch > - HDFS > > HDFS is always used, elasticsearch or solr is used depending on how you > start the indexing topology. > > A couple of proposals come to mind immediately: > > *Index Filtering* > > You would be able to specify a filter as defined by a stellar statement > (likely a reuse of the StellarFilter that exists in the Parsers) which > would allow you to indicate on a message-by-message basis whether or not to > write the message. > > The semantics of this would be as follows: > > - Default (i.e. unspecified) is to pass everything through (hence > backwards compatible with the current default config). > - Messages which have the associated stellar statement evaluate to true > for the writer type will be written, otherwise not. > > > Sample indexing config which would write out no messages to HDFS and write > out only messages containing a field called "field1": > { > "index" : "squid" > ,"batchSize" : 100 > ,"filters" : { > "HDFS" : "false" > ,"ES" : "exists(field1)" > } > } > > *Index On/Off Switch* > > A simpler solution would be to just provide a list of writers to write > messages. The semantics would be as follows: > > - If the list is unspecified, then the default is to write all messages > for every writer in the indexing topology > - If the list is specified, then a writer will write all messages if and > only if it is named in the list. > > Sample indexing config which turns off HDFS and keeps on Elasticsearch: > { > "index" : "squid" > ,"batchSize" : 100 > ,"writers" : [ "ES" ] > } > > Thanks in advance for the feedback! Also, if you have any other, better > ideas than the ones presented here, let me know too. > > Best, > > Casey >
