I like the flexibility and expressibility of the first option with Stellar
filters.

M

On Thu, Jan 12, 2017 at 1:51 PM, Casey Stella <[email protected]> wrote:

> As of METRON-652 <https://github.com/apache/incubator-metron/pull/415>, we
> will have decoupled the indexing configuration from the enrichment
> configuration.  As an immediate follow-up to that, I'd like to provide the
> ability to turn off and on writers via the configs.  I'd like to get some
> community feedback on how the functionality should work, if y'all are
> amenable. :)
>
>
> As of now, we have 3 possible writers which can be used in the indexing
> topology:
>
>    - Solr
>    - Elasticsearch
>    - HDFS
>
> HDFS is always used, elasticsearch or solr is used depending on how you
> start the indexing topology.
>
> A couple of proposals come to mind immediately:
>
> *Index Filtering*
>
> You would be able to specify a filter as defined by a stellar statement
> (likely a reuse of the StellarFilter that exists in the Parsers) which
> would allow you to indicate on a message-by-message basis whether or not to
> write the message.
>
> The semantics of this would be as follows:
>
>    - Default (i.e. unspecified) is to pass everything through (hence
>    backwards compatible with the current default config).
>    - Messages which have the associated stellar statement evaluate to true
>    for the writer type will be written, otherwise not.
>
>
> Sample indexing config which would write out no messages to HDFS and write
> out only messages containing a field called "field1":
> {
>    "index" : "squid"
>   ,"batchSize" : 100
>   ,"filters" : {
>       "HDFS" : "false"
>      ,"ES" : "exists(field1)"
>                  }
> }
>
> *Index On/Off Switch*
>
> A simpler solution would be to just provide a list of writers to write
> messages.  The semantics would be as follows:
>
>    - If the list is unspecified, then the default is to write all messages
>    for every writer in the indexing topology
>    - If the list is specified, then a writer will write all messages if and
>    only if it is named in the list.
>
> Sample indexing config which turns off HDFS and keeps on Elasticsearch:
> {
>    "index" : "squid"
>   ,"batchSize" : 100
>   ,"writers" : [ "ES" ]
> }
>
> Thanks in advance for the feedback!  Also, if you have any other, better
> ideas than the ones presented here, let me know too.
>
> Best,
>
> Casey
>

Reply via email to