I'm all for a compromise here. Sounds like we're getting close. Just one thing. Can you layout the reasoning for having 'enabled' and 'when'? I don't follow the reasoning, but maybe I am missing something.
On Sat, Jan 14, 2017 at 12:13 PM, Kyle Richardson <kylerichards...@gmail.com > wrote: > I'm +1 on the current proposal. I like Nick's syntax and agree with Jon's > enabled property. I also like the idea of a path property for HDFS. > > -Kyle > > > On Jan 14, 2017, at 10:51 AM, Casey Stella <ceste...@gmail.com> wrote: > > > > I'm +1 on an explicit enabled property and a filter (or when) property. I > > think we are zeroing in on a decent design, so that is good. > > > > To recap, what I am +1 on is Nick's proposed syntax with the following > > modifications: > > 1. An explicit enabled field > > 2. A default on for unspecified to match current semantics > > > > Casey > >> On Sat, Jan 14, 2017 at 10:45 zeo...@gmail.com <zeo...@gmail.com> > wrote: > >> > >> This has the additional benefit of doing something like below when you > want > >> to temporarily disable the hdfs writer, but don't want to remove the > >> settings. This removes the need to store the path and batchSize (and > many > >> additional settings) somewhere else so they can be brought back in when > you > >> want to re-enable it, which is a nice workflow attribute for the end > user: > >> > >> { > >> 'elasticsearch': { > >> 'enabled': 'true', > >> 'index': 'foo', > >> 'batchSize': 100, > >> }, > >> 'hdfs': { > >> 'enabled': 'false', > >> 'path': '/foo/bar/...', > >> 'batchSize': 100, > >> }, > >> 'solr': { > >> 'enabled': 'false' > >> } > >> } > >> > >> Jon > >> > >>> On Sat, Jan 14, 2017 at 9:24 AM zeo...@gmail.com <zeo...@gmail.com> > wrote: > >>> > >>> I similarly have a concern there because I prefer being as explicit as > >>> possible, which makes things easier to pick up for new users. Using my > >>> example from earlier this could look like specifying while(false), but > an > >>> even better and more obvious approach may be to use enabled(false). So > >> the > >>> current simple default would be: > >>> > >>> { > >>> 'elasticsearch': { 'enabled': 'true' }, > >>> 'hdfs': { 'enabled': 'true' }, > >>> 'solr': { enabled': 'false' } > >>> } > >>> > >>> And to use ES with some overrides but not HDFS or solr it would look > >> like: > >>> > >>> { > >>> 'elasticsearch': { > >>> 'enabled': 'true', > >>> 'index': 'foo', > >>> 'batchSize': 100 > >>> }, > >>> 'hdfs': { > >>> 'enabled': 'false' > >>> }, > >>> 'solr': { > >>> 'enabled': 'false' > >>> } > >>> } > >>> > >>> Jon > >>> > >>> On Fri, Jan 13, 2017 at 10:21 PM Casey Stella <ceste...@gmail.com> > >> wrote: > >>> > >>> One thing that I thought of that I very strenuous do not like in Nick's > >>> proposal is that if a writer config is not specified then it is turned > >> off > >>> (I think; if I misunderstood let me know). In the situation where we > >> have a > >>> new sensor, right now if there are no index config and no enrichment > >>> config, it still passes through to the index using defaults. In this > new > >>> scheme it would not. This changes the default semantics for the system > >> and > >>> I think it changes it for the worse. > >>> > >>> I would strongly prefer a on-by-default indexing config as we have now. > >>>> On Fri, Jan 13, 2017 at 17:13 Casey Stella <ceste...@gmail.com> > wrote: > >>>> > >>>> One thing that I really like about Nick's suggestion is that it allows > >>>> writer-specific configs in a clear and simple way. It is more complex > >>> for > >>>> the default case (all writers write to indices named the same thing > >> with > >>> a > >>>> fixed batch size), which I do not like, but maybe it's worth the > >>> compromise > >>>> to make it less complex for the advanced case. > >>>> > >>>> Thanks a lot for the suggestion, Nick, it's interesting; I'm > beginning > >>> to > >>>> lean your way. > >>>> > >>>> On Fri, Jan 13, 2017 at 2:51 PM, zeo...@gmail.com <zeo...@gmail.com> > >>>> wrote: > >>>> > >>>> I like the suggestions you made, Nick. The only thing I would add is > >>> that > >>>> it's also nice to see an explicit when(false), as people newer to the > >>>> platform may not know where to expect configs for the different > >> writers. > >>>> Being able to do it either way, which I think is already assumed in > >> your > >>>> model, would make sense. I would just suggest that, if we support but > >>> are > >>>> disabling a writer, that the platform inserts a default when(false) to > >> be > >>>> explicit. > >>>> > >>>> Jon > >>>> > >>>> On Fri, Jan 13, 2017 at 11:59 AM Casey Stella <ceste...@gmail.com> > >>> wrote: > >>>> > >>>>> Let me noodle on this over the weekend. Your syntax is looking less > >>>>> onerous to me and I like the following statement from Otto: "In the > >>> end, > >>>>> each write destination ‘type’ will need it’s own configuration. This > >>> is > >>>> an > >>>>> extension point." > >>>>> > >>>>> I may come around to your way of thinking. > >>>>> > >>>>> On Fri, Jan 13, 2017 at 11:57 AM, Otto Fowler < > >> ottobackwa...@gmail.com > >>>> > >>>>> wrote: > >>>>> > >>>>>> In the end, each write destination ‘type’ will need it’s own > >>>>>> configuration. This is an extension point. > >>>>>> { > >>>>>> HDFS:{ > >>>>>> outputAdapters:[ > >>>>>> {name: avro, > >>>>>> settings:{ > >>>>>> avro stuff…. > >>>>>> when:{ > >>>>>> }, > >>>>>> { > >>>>>> name: sequence file, > >>>>>> ….. > >>>>>> > >>>>>> or some such. > >>>>>> > >>>>>> > >>>>>> On January 13, 2017 at 11:51:15, Nick Allen (n...@nickallen.org) > >>>> wrote: > >>>>>> > >>>>>> I will add also that instead of global overrides, like index, we > >>> should > >>>>> use > >>>>>> configuration key names that are more appropriate to the output. > >>>>>> > >>>>>> For example, does 'index' really make sense for HDFS? Or would > >> 'path' > >>>> be > >>>>>> more appropriate? > >>>>>> > >>>>>> { > >>>>>> 'elasticsearch': { > >>>>>> 'index': 'foo', > >>>>>> 'batchSize': 1 > >>>>>> }, > >>>>>> 'hdfs': { > >>>>>> 'path': '/foo/bar/...', > >>>>>> 'batchSize': 100 > >>>>>> } > >>>>>> } > >>>>>> > >>>>>> Ok, I've said my peace. Thanks for the effort in summarizing all > >>> this, > >>>>>> Casey. > >>>>>> > >>>>>> > >>>>>> On Fri, Jan 13, 2017 at 11:42 AM, Nick Allen <n...@nickallen.org> > >>>> wrote: > >>>>>> > >>>>>>> Nick's concerns about my suggestion were that it was overly > >> complex > >>>> and > >>>>>>>> hard to grok and that we could dispense with backwards > >>> compatibility > >>>>> and > >>>>>>>> make people do a bit more work on the default case for the > >>> benefits > >>>>> of a > >>>>>>>> simpler advanced case. (Nick, make sure I don't misstate your > >>>>> position) > >>>>>>> > >>>>>>> > >>>>>>> I will add is that in my mind, the majority case would be a user > >>>>>>> specifying the outputs, but not things like 'batchSize' or > >> 'when'. > >>> I > >>>>>> think > >>>>>>> in the majority case, the user would accept whatever the default > >>>> batch > >>>>>> size > >>>>>>> is. > >>>>>>> > >>>>>>> Here are alternatives suggestions for all the examples that you > >>>>> provided > >>>>>>> previously. > >>>>>>> > >>>>>>> Base Case > >>>>>>> > >>>>>>> - The user must always specify the 'outputs' for clarity. > >>>>>>> - Uses default index name, batch size and when = true. > >>>>>>> > >>>>>>> { > >>>>>>> 'elasticsearch': {}, > >>>>>>> 'hdfs': {} > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> < > >>>>>> https://gist.github.com/nickwallen/489735b65cdb38aae6e45cec7633a0 > >>>>>> a1#writer-non-specific-case>Writer-non-specific > >>>>>> > >>>>>>> Case > >>>>>>> > >>>>>>> - There are no global overrides, as in Casey's proposal. > >>>>>>> - Easier to grok IMO. > >>>>>>> > >>>>>>> { > >>>>>>> 'elasticsearch': { > >>>>>>> 'index': 'foo', > >>>>>>> 'batchSize': 100 > >>>>>>> }, > >>>>>>> 'hdfs': { > >>>>>>> 'index': 'foo', > >>>>>>> 'batchSize': 100 > >>>>>>> } > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> < > >>>>>> https://gist.github.com/nickwallen/489735b65cdb38aae6e45cec7633a0 > >>>>>> a1#writer-specific-case-without-filters>Writer-specific > >>>>>> > >>>>>>> case without filters > >>>>>>> > >>>>>>> { > >>>>>>> 'elasticsearch': { > >>>>>>> 'index': 'foo', > >>>>>>> 'batchSize': 1 > >>>>>>> }, > >>>>>>> 'hdfs': { > >>>>>>> 'index': 'foo', > >>>>>>> 'batchSize': 100 > >>>>>>> } > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> < > >>>>>> https://gist.github.com/nickwallen/489735b65cdb38aae6e45cec7633a0 > >>>>>> a1#writer-specific-case-with-filters>Writer-specific > >>>>>> > >>>>>>> case with filters > >>>>>>> > >>>>>>> - Instead of having to say when=false, just don't configure HDFS > >>>>>>> > >>>>>>> { > >>>>>>> 'elasticsearch': { > >>>>>>> 'index': 'foo', > >>>>>>> 'batchSize': 100, > >>>>>>> 'when': 'exists(field1)' > >>>>>>> } > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Fri, Jan 13, 2017 at 11:06 AM, Casey Stella < > >> ceste...@gmail.com > >>>> > >>>>>> wrote: > >>>>>>> > >>>>>>>> Dave, > >>>>>>>> For the benefit of posterity and people who might not be as > >> deeply > >>>>>>>> entangled in the system as we have been, I'll recap things and > >>>>> hopefully > >>>>>>>> answer your question in the process. > >>>>>>>> > >>>>>>>> Historically the index configuration is split between the > >>> enrichment > >>>>>>>> configs and the global configs. > >>>>>>>> > >>>>>>>> - The global configs really controls configs that apply to all > >>>>> sensors. > >>>>>>>> Historically this has been stuff like index connection strings, > >>> etc. > >>>>>>>> - The sensor-specific configs which control things that vary by > >>>>> sensor. > >>>>>>>> > >>>>>>>> As of Metron-652 (in review currently), we moved the sensor > >>> specific > >>>>>>>> configs from the enrichment configs. The proposal here is to > >>>> increase > >>>>>> the > >>>>>>>> granularity of the the sensor specific files to make them > >> support > >>>>> index > >>>>>>>> writer-specific configs. Right now in the indexing topology, we > >>>> have 2 > >>>>>>>> writers (fixed): ES/Solr and HDFS. > >>>>>>>> > >>>>>>>> The proposed configuration would allow you to either specify a > >>>> blanket > >>>>>>>> sensor-level config for the index name and batchSize and/or > >>> override > >>>>> at > >>>>>>>> the > >>>>>>>> writer level, thereby supporting a couple of use-cases: > >>>>>>>> > >>>>>>>> - Turning off certain index writers (e.g. HDFS) > >>>>>>>> - Filtering the messages written to certain index writers > >>>>>>>> > >>>>>>>> The two competing configs between Nick and I are as follows: > >>>>>>>> > >>>>>>>> - I want to make sure we keep the old sensor-specific defaults > >>> with > >>>>>>>> writer-specific overrides available > >>>>>>>> - Nick thought we could simplify the permutations by making the > >>>>>>>> indexing > >>>>>>>> config only the writer-level configs. > >>>>>>>> > >>>>>>>> My concerns about Nick's suggestion were that the default and > >>>> majority > >>>>>>>> case, specifying the index and the batchSize for all writers (th > >>>> eone > >>>>> we > >>>>>>>> support now) would require more configuration. > >>>>>>>> > >>>>>>>> Nick's concerns about my suggestion were that it was overly > >>> complex > >>>>> and > >>>>>>>> hard to grok and that we could dispense with backwards > >>> compatibility > >>>>> and > >>>>>>>> make people do a bit more work on the default case for the > >>> benefits > >>>>> of a > >>>>>>>> simpler advanced case. (Nick, make sure I don't misstate your > >>>>> position). > >>>>>>>> > >>>>>>>> Casey > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Jan 13, 2017 at 10:54 AM, David Lyle < > >>> dlyle65...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Casey, > >>>>>>>>> > >>>>>>>>> Can you give me a level set of what your thinking is now? I > >>> think > >>>>> it's > >>>>>>>>> global control of all index types + overrides on a per-type > >>> basis. > >>>>>> Fwiw, > >>>>>>>>> I'm totally for that, but I want to make sure I'm not imposing > >>> my > >>>>>>>>> pre-concieved notions on your consensus-driven ones. > >>>>>>>>> > >>>>>>>>> -D.... > >>>>>>>>> > >>>>>>>>> On Fri, Jan 13, 2017 at 10:44 AM, Casey Stella < > >>>> ceste...@gmail.com> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> I am suggesting that, yes. The configs are essentially the > >>> same > >>>> as > >>>>>>>>> yours, > >>>>>>>>>> except there is an override specified at the top level. > >>> Without > >>>>>>>> that, in > >>>>>>>>>> order to specify both HDFS and ES have batch sizes of 100, > >> you > >>>>> have > >>>>>> to > >>>>>>>>>> explicitly configure each. It's less that I'm trying to have > >>>>>>>> backwards > >>>>>>>>>> compatibility and more that I'm trying to make the majority > >>> case > >>>>>> easy: > >>>>>>>>> both > >>>>>>>>>> writers write everything to a specified index name with a > >>>>> specified > >>>>>>>> batch > >>>>>>>>>> size (which is what we have now). Beyond that, I want to > >> allow > >>>> for > >>>>>>>>>> specifying an override for the config on a writer-by-writer > >>>> basis > >>>>>> for > >>>>>>>>> those > >>>>>>>>>> who need it. > >>>>>>>>>> > >>>>>>>>>> On Fri, Jan 13, 2017 at 10:39 AM, Nick Allen < > >>>> n...@nickallen.org> > >>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Are you saying we support all of these variants? I realize > >>> you > >>>>> are > >>>>>>>>>> trying > >>>>>>>>>>> to have some backwards compatibility, but this also makes > >> it > >>>>>> harder > >>>>>>>>> for a > >>>>>>>>>>> user to grok (for me at least). > >>>>>>>>>>> > >>>>>>>>>>> Personally I like my original example as there are fewer > >>>>>>>>> sub-structures, > >>>>>>>>>>> like 'writerConfig', which makes the whole thing simpler > >> and > >>>>>> easier > >>>>>>>> to > >>>>>>>>>>> grok. But maybe others will think your proposal is just as > >>>> easy > >>>>> to > >>>>>>>>> grok. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Jan 13, 2017 at 10:01 AM, Casey Stella < > >>>>>> ceste...@gmail.com> > >>>>>> > >>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Ok, so here's what I'm thinking based on the discussion: > >>>>>>>>>>>> > >>>>>>>>>>>> - Keeping the configs that we have now (batchSize and > >>> index) > >>>>> as > >>>>>>>>>>> defaults > >>>>>>>>>>>> for the unspecified writer-specific case > >>>>>>>>>>>> - Adding the config Nick suggested > >>>>>>>>>>>> > >>>>>>>>>>>> *Base Case*: > >>>>>>>>>>>> { > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> - all writers write all messages > >>>>>>>>>>>> - index named the same as the sensor for all writers > >>>>>>>>>>>> - batchSize of 1 for all writers > >>>>>>>>>>>> > >>>>>>>>>>>> *Writer-non-specific case*: > >>>>>>>>>>>> { > >>>>>>>>>>>> "index" : "foo" > >>>>>>>>>>>> ,"batchSize" : 100 > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> - All writers write all messages > >>>>>>>>>>>> - index is named "foo", different from the sensor for > >> all > >>>>>>>> writers > >>>>>>>>>>>> - batchSize is 100 for all writers > >>>>>>>>>>>> > >>>>>>>>>>>> *Writer-specific case without filters* > >>>>>>>>>>>> { > >>>>>>>>>>>> "index" : "foo" > >>>>>>>>>>>> ,"batchSize" : 1 > >>>>>>>>>>>> , "writerConfig" : > >>>>>>>>>>>> { > >>>>>>>>>>>> "elasticsearch" : { > >>>>>>>>>>>> "batchSize" : 100 > >>>>>>>>>>>> } > >>>>>>>>>>>> } > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> - All writers write all messages > >>>>>>>>>>>> - index is named "foo", different from the sensor for > >> all > >>>>>>>> writers > >>>>>>>>>>>> - batchSize is 1 for HDFS and 100 for elasticsearch > >>> writers > >>>>>>>>>>>> - NOTE: I could override the index name too > >>>>>>>>>>>> > >>>>>>>>>>>> *Writer-specific case with filters* > >>>>>>>>>>>> { > >>>>>>>>>>>> "index" : "foo" > >>>>>>>>>>>> ,"batchSize" : 1 > >>>>>>>>>>>> , "writerConfig" : > >>>>>>>>>>>> { > >>>>>>>>>>>> "elasticsearch" : { > >>>>>>>>>>>> "batchSize" : 100, > >>>>>>>>>>>> "when" : "exists(field1)" > >>>>>>>>>>>> }, > >>>>>>>>>>>> "hdfs" : { > >>>>>>>>>>>> "when" : "false" > >>>>>>>>>>>> } > >>>>>>>>>>>> } > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> - ES writer writes messages which have field1, HDFS > >>> doesn't > >>>>>>>>>>>> - index is named "foo", different from the sensor for > >> all > >>>>>>>> writers > >>>>>>>>>>>> - 100 for elasticsearch writers > >>>>>>>>>>>> > >>>>>>>>>>>> Thoughts? > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Jan 13, 2017 at 9:44 AM, Carolyn Duby < > >>>>>>>> cd...@hortonworks.com > >>>>>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> For larger installations you need to control what is > >>>> indexed > >>>>>> so > >>>>>>>> you > >>>>>>>>>>> don’t > >>>>>>>>>>>>> end up with a nasty elastic search situation and so > >> you > >>>> can > >>>>>> mine > >>>>>>>>> the > >>>>>>>>>>> data > >>>>>>>>>>>>> later for reports and training ml models. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks > >>>>>>>>>>>>> Carolyn > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 1/13/17, 9:40 AM, "Casey Stella" < > >> ceste...@gmail.com > >>>> > >>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> OH that's a good idea! > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, Jan 13, 2017 at 9:39 AM, Nick Allen < > >>>>>>>> n...@nickallen.org> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I like the "Index Filtering" option based on the > >>>>>> flexibility > >>>>>>>>> that > >>>>>>>>>> it > >>>>>>>>>>>>>>> provides. Should each output (HDFS, ES, etc) have > >> its > >>>> own > >>>>>>>>>>>> configuration > >>>>>>>>>>>>>>> settings? For example, aren't things like batching > >>>>> handled > >>>>>>>>>>> separately > >>>>>>>>>>>>> for > >>>>>>>>>>>>>>> HDFS versus Elasticsearch? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Something along the lines of... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>> "hdfs" : { > >>>>>>>>>>>>>>> "when": "exists(field1)", > >>>>>>>>>>>>>>> "batchSize": 100 > >>>>>>>>>>>>>>> }, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> "elasticsearch" : { > >>>>>>>>>>>>>>> "when": "true", > >>>>>>>>>>>>>>> "batchSize": 1000, > >>>>>>>>>>>>>>> "index": "squid" > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Fri, Jan 13, 2017 at 9:10 AM, Casey Stella < > >>>>>>>>> ceste...@gmail.com > >>>>>>>>>>> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Yeah, I tend to like the first option too. Any > >>>>> opposition > >>>>>>>> to > >>>>>>>>>> that > >>>>>>>>>>>>> from > >>>>>>>>>>>>>>>> anyone? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The points brought up are good ones and I think > >>> that > >>>> it > >>>>>>>> may be > >>>>>>>>>>>> worth a > >>>>>>>>>>>>>>>> broader discussion of the requirements of > >> indexing > >>>> in a > >>>>>>>>> separate > >>>>>>>>>>> dev > >>>>>>>>>>>>> list > >>>>>>>>>>>>>>>> thread. Maybe a list of desires with coherent > >>>> use-cases > >>>>>>>>>>> justifying > >>>>>>>>>>>>> them > >>>>>>>>>>>>>>> so > >>>>>>>>>>>>>>>> we can think about how this stuff should work and > >>>> where > >>>>>> the > >>>>>>>>>>> natural > >>>>>>>>>>>>>>>> extension points should be. Afterall, we need to > >>> toe > >>>>> the > >>>>>>>> line > >>>>>>>>>>>> between > >>>>>>>>>>>>>>>> engineering and overengineering for features > >> nobody > >>>>> will > >>>>>>>> want. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I'm not sure about the extensions to the standard > >>>>> fields. > >>>>>>>> I'm > >>>>>>>>>>> torn > >>>>>>>>>>>>>>> between > >>>>>>>>>>>>>>>> the notions that we should have no standard > >> fields > >>> vs > >>>>> we > >>>>>>>>> should > >>>>>>>>>>>> have a > >>>>>>>>>>>>>>>> boatload of standard fields (with most of them > >>>> empty). > >>>>> I > >>>>>>>>>> exchange > >>>>>>>>>>>>>>>> positions fairly regularly on that question. ;) > >> It > >>>> may > >>>>> be > >>>>>>>>>> worth a > >>>>>>>>>>>> dev > >>>>>>>>>>>>>>> list > >>>>>>>>>>>>>>>> discussion to lay out how you imagine an > >> extension > >>> of > >>>>>>>> standard > >>>>>>>>>>>> fields > >>>>>>>>>>>>> and > >>>>>>>>>>>>>>>> how it might look as implemented in Metron. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Casey > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Casey > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 9:58 PM, Kyle Richardson > >> < > >>>>>>>>>>>>>>>> kylerichards...@gmail.com> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'll second my preference for the first > >> option. I > >>>>> think > >>>>>>>> the > >>>>>>>>>>>> ability > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>>> use > >>>>>>>>>>>>>>>>> Stellar filters to customize indexing would be > >> a > >>>> big > >>>>>> win. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'm glad Matt brought up the point about data > >>> lake > >>>>> and > >>>>>>>> CEP. > >>>>>>>>> I > >>>>>>>>>>>> think > >>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>> a really important use case that we need to > >>>> consider. > >>>>>>>> Take a > >>>>>>>>>>>> simple > >>>>>>>>>>>>>>>>> example... If I have data coming in from 3 > >>>> different > >>>>>>>>> firewall > >>>>>>>>>>>>> vendors > >>>>>>>>>>>>>>>> and 2 > >>>>>>>>>>>>>>>>> different web proxy/url filtering vendors and I > >>>> want > >>>>> to > >>>>>>>> be > >>>>>>>>>> able > >>>>>>>>>>> to > >>>>>>>>>>>>>>>> analyze > >>>>>>>>>>>>>>>>> that data set, I need the data to be indexed > >> all > >>>>>> together > >>>>>>>>>>> (likely > >>>>>>>>>>>> in > >>>>>>>>>>>>>>>> HDFS) > >>>>>>>>>>>>>>>>> and to have a normalized schema such that IP > >>>> address, > >>>>>>>> URL, > >>>>>>>>> and > >>>>>>>>>>>> user > >>>>>>>>>>>>>>> name > >>>>>>>>>>>>>>>>> (to take a few) can be easily queried and > >>>>> aggregated. I > >>>>>>>> can > >>>>>>>>>> also > >>>>>>>>>>>>>>> envision > >>>>>>>>>>>>>>>>> scenarios where I would want to index data > >> based > >>> on > >>>>>>>>> attributes > >>>>>>>>>>>> other > >>>>>>>>>>>>>>> than > >>>>>>>>>>>>>>>>> sensor, business unit or subsidiary for > >> example. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I've been wanted to propose extending our 7 > >>>> standard > >>>>>>>> fields > >>>>>>>>> to > >>>>>>>>>>>>> include > >>>>>>>>>>>>>>>>> things like URL and user. Is there community > >>>>>>>>> interest/support > >>>>>>>>>>> for > >>>>>>>>>>>>>>> moving > >>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>> that direction? If so, I'll start a new thread. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks! > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> -Kyle > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 6:51 PM, Matt Foley < > >>>>>>>>> ma...@apache.org > >>>>>>>>>>> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Ah, I see. If overriding the default index > >> name > >>>>>> allows > >>>>>>>>>> using > >>>>>>>>>>>> the > >>>>>>>>>>>>>>> same > >>>>>>>>>>>>>>>>>> name for multiple sensors, then the goal can > >> be > >>>>>>>> achieved. > >>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>> --Matt > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On 1/12/17, 3:30 PM, "Casey Stella" < > >>>>>>>> ceste...@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Oh, you could! Let's say you have a syslog > >>> parser > >>>>>>>>> with > >>>>>>>>>>> data > >>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>> sources 1 > >>>>>>>>>>>>>>>>>> 2 and 3. You'd end up with one kafka queue > >>> with 3 > >>>>>>>>>> parsers > >>>>>>>>>>>>>>> attached > >>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>> queue, each picking part the messages from > >>> source > >>>>>>>> 1, 2 > >>>>>>>>>> and > >>>>>>>>>>>> 3. > >>>>>>>>>>>>>>>> They'd > >>>>>>>>>>>>>>>>>> go > >>>>>>>>>>>>>>>>>> through separate enrichment and into the > >>> indexing > >>>>>>>>>>> topology. > >>>>>>>>>>>>> In > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> indexing topology, you could specify the same > >>>> index > >>>>>>>>> name > >>>>>>>>>>>>> "syslog" > >>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>> all > >>>>>>>>>>>>>>>>>> of the messages go into the same index for > >> CEP > >>>>>>>>> querying > >>>>>>>>>> if > >>>>>>>>>>>> so > >>>>>>>>>>>>>>>>> desired. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 6:27 PM, Matt Foley < > >>>>>>>>>>>> ma...@apache.org > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Syslog is hell on parsers – I know, I > >> worked > >>> at > >>>>>>>>>> LogLogic > >>>>>>>>>>>> in > >>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>> previous > >>>>>>>>>>>>>>>>>>> life. It makes perfect sense to route > >>> different > >>>>>>>>> lines > >>>>>>>>>>>> from > >>>>>>>>>>>>>>>> syslog > >>>>>>>>>>>>>>>>>> through > >>>>>>>>>>>>>>>>>>> different appropriate parsers. But a lot of > >>>> what > >>>>>>>>> the > >>>>>>>>>>>>> parsers > >>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>> identify consistent subsets of metadata and > >>>>>>>> annotate > >>>>>>>>>> it > >>>>>>>>>>> – > >>>>>>>>>>>>> eg, > >>>>>>>>>>>>>>>>>> src_ip_addr, > >>>>>>>>>>>>>>>>>>> event timestamps, etc. Once those metadata > >>> are > >>>>>>>>>>> annotated > >>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>> available > >>>>>>>>>>>>>>>>>>> with common field names, why doesn’t it > >> make > >>>>>>>> sense > >>>>>>>>> to > >>>>>>>>>>>> index > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> messages > >>>>>>>>>>>>>>>>>>> together, for CEP querying? I think Splunk > >>> has > >>>>>>>>>>>> illustrated > >>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>> model. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On 1/12/17, 3:00 PM, "Casey Stella" < > >>>>>>>>>> ceste...@gmail.com > >>>>>>>>>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> yeah, I mean, honestly, I think the > >> approach > >>>>>>>>> that > >>>>>>>>>>>> we've > >>>>>>>>>>>>>>> taken > >>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>> sources > >>>>>>>>>>>>>>>>>>> which aggregate different types of data is > >> to > >>>>>>>>>>> provide > >>>>>>>>>>>>>>> filters > >>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> parser > >>>>>>>>>>>>>>>>>>> level and have multiple parser topologies > >>>>>>>> (with > >>>>>>>>>>>>> different, > >>>>>>>>>>>>>>>>>> possibly > >>>>>>>>>>>>>>>>>>> mutually exclusive filters) running. This > >>>>>>>> would > >>>>>>>>>> be > >>>>>>>>>>> a > >>>>>>>>>>>>>>>>> completely > >>>>>>>>>>>>>>>>>>> separate > >>>>>>>>>>>>>>>>>>> sensor. Imagine a syslog data source that > >>>>>>>>>>> aggregates > >>>>>>>>>>>>> and > >>>>>>>>>>>>>>> you > >>>>>>>>>>>>>>>>>> want to > >>>>>>>>>>>>>>>>>>> pick > >>>>>>>>>>>>>>>>>>> apart certain pieces of messages. This is > >>>>>>>> why > >>>>>>>>> the > >>>>>>>>>>>>> initial > >>>>>>>>>>>>>>>>>> thought and > >>>>>>>>>>>>>>>>>>> architecture was one index per sensor. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 5:55 PM, Matt > >> Foley < > >>>>>>>>>>>>>>>> ma...@apache.org> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> I’m thinking that CEP (Complex Event > >>>>>>>>> Processing) > >>>>>>>>>>> is > >>>>>>>>>>>>>>>> contrary > >>>>>>>>>>>>>>>>>> to the > >>>>>>>>>>>>>>>>>>> idea > >>>>>>>>>>>>>>>>>>>> of silo-ing data per sensor. > >>>>>>>>>>>>>>>>>>>> Now it’s true that some of those sensors > >>>>>>>> are > >>>>>>>>>>> already > >>>>>>>>>>>>>>>>>> aggregating > >>>>>>>>>>>>>>>>>>> data from > >>>>>>>>>>>>>>>>>>>> multiple sources, so maybe I’m wrong > >> here. > >>>>>>>>>>>>>>>>>>>> But it just seems to me that the “data > >>>>>>>> lake” > >>>>>>>>>>>> insights > >>>>>>>>>>>>>>> come > >>>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>>> being able > >>>>>>>>>>>>>>>>>>>> to make decisions over the whole mass of > >>>>>>>> data > >>>>>>>>>>> rather > >>>>>>>>>>>>> than > >>>>>>>>>>>>>>>>> just > >>>>>>>>>>>>>>>>>>> vertical > >>>>>>>>>>>>>>>>>>>> slices of it. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On 1/12/17, 2:15 PM, "Casey Stella" < > >>>>>>>>>>>>> ceste...@gmail.com> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Hey Matt, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Thanks for the comment! > >>>>>>>>>>>>>>>>>>>> 1. At the moment, we only have one > >>>>>>>> index > >>>>>>>>>> name, > >>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> default > >>>>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>> which is > >>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> sensor name but that's entirely up to > >>>>>>>> the > >>>>>>>>>>> user. > >>>>>>>>>>>>> This > >>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>> sensor > >>>>>>>>>>>>>>>>>>>> specific, > >>>>>>>>>>>>>>>>>>>> so it'd be a separate config for each > >>>>>>>>>> sensor. > >>>>>>>>>>>> If > >>>>>>>>>>>>> we > >>>>>>>>>>>>>>>> want > >>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>> build > >>>>>>>>>>>>>>>>>>>> multiple > >>>>>>>>>>>>>>>>>>>> indices per sensor, we'd have to think > >>>>>>>>>>> carefully > >>>>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>> how > >>>>>>>>>>>>>>>>>> to do > >>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>> would be a bigger undertaking. I > >>>>>>>> guess I > >>>>>>>>>> can > >>>>>>>>>>>> see > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> use, > >>>>>>>>>>>>>>>>>> though > >>>>>>>>>>>>>>>>>>>> (redirect > >>>>>>>>>>>>>>>>>>>> messages to one index vs another based > >>>>>>>> on > >>>>>>>>> a > >>>>>>>>>>>>> predicate > >>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>> a given > >>>>>>>>>>>>>>>>>>>> sensor). > >>>>>>>>>>>>>>>>>>>> Anyway, not where I was originally > >>>>>>>>> thinking > >>>>>>>>>>> that > >>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>> discussion > >>>>>>>>>>>>>>>>>>> would > >>>>>>>>>>>>>>>>>>>> go, > >>>>>>>>>>>>>>>>>>>> but it's an interesting point. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> 2. I hadn't thought through the > >>>>>>>>>> implementation > >>>>>>>>>>>>> quite > >>>>>>>>>>>>>>>> yet, > >>>>>>>>>>>>>>>>>> but we > >>>>>>>>>>>>>>>>>>> don't > >>>>>>>>>>>>>>>>>>>> actually have a splitter bolt in that > >>>>>>>>>>> topology, > >>>>>>>>>>>>> just > >>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>> spout > >>>>>>>>>>>>>>>>>>> that goes > >>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>> the elasticsearch writer and also to > >>>>>>>> the > >>>>>>>>>> hdfs > >>>>>>>>>>>>> writer. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 4:52 PM, Matt > >>>>>>>>> Foley > >>>>>>>>>> < > >>>>>>>>>>>>>>>>>> ma...@apache.org> > >>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Casey, good to have controls like > >>>>>>>> this. > >>>>>>>>>>>> Couple > >>>>>>>>>>>>>>>>>> questions: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> 1. Regarding the “index” : “squid” > >>>>>>>>>>> name/value > >>>>>>>>>>>>> pair, > >>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> index name > >>>>>>>>>>>>>>>>>>>>> expected to always be a sensor > >>>>>>>> name? Or > >>>>>>>>>> is > >>>>>>>>>>>> the > >>>>>>>>>>>>>>> given > >>>>>>>>>>>>>>>>>> json > >>>>>>>>>>>>>>>>>>> structure > >>>>>>>>>>>>>>>>>>>>> subordinate to a sensor name in > >>>>>>>>> zookeeper? > >>>>>>>>>>> Or > >>>>>>>>>>>>> can > >>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>> build > >>>>>>>>>>>>>>>>>>> arbitrary > >>>>>>>>>>>>>>>>>>>>> indexes with this new specification, > >>>>>>>>>>>>> independent of > >>>>>>>>>>>>>>>>>> sensor? > >>>>>>>>>>>>>>>>>>> Should > >>>>>>>>>>>>>>>>>>>> there > >>>>>>>>>>>>>>>>>>>>> actually be a list of “indexes”, ie > >>>>>>>>>>>>>>>>>>>>> { “indexes” : [ > >>>>>>>>>>>>>>>>>>>>> {“index” : “name1”, > >>>>>>>>>>>>>>>>>>>>> … > >>>>>>>>>>>>>>>>>>>>> }, > >>>>>>>>>>>>>>>>>>>>> {“index” : “name2”, > >>>>>>>>>>>>>>>>>>>>> … > >>>>>>>>>>>>>>>>>>>>> } ] > >>>>>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> 2. Would the filtering / writer > >>>>>>>>> selection > >>>>>>>>>>>> logic > >>>>>>>>>>>>>>> take > >>>>>>>>>>>>>>>>>> place in > >>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> indexing > >>>>>>>>>>>>>>>>>>>>> topology splitter bolt? Seems like > >>>>>>>> that > >>>>>>>>>>> would > >>>>>>>>>>>>> have > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> smallest > >>>>>>>>>>>>>>>>>>>> impact on > >>>>>>>>>>>>>>>>>>>>> current implementation, no? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Sorry if these are already answered > >>>>>>>> in > >>>>>>>>>>>> PR-415, I > >>>>>>>>>>>>>>>>> haven’t > >>>>>>>>>>>>>>>>>> had > >>>>>>>>>>>>>>>>>>> time to > >>>>>>>>>>>>>>>>>>>>> review that one yet. > >>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>> --Matt > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On 1/12/17, 12:55 PM, "Michael > >>>>>>>>> Miklavcic" > >>>>>>>>>> < > >>>>>>>>>>>>>>>>>>>> michael.miklav...@gmail.com> > >>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I like the flexibility and > >>>>>>>>>>> expressibility > >>>>>>>>>>>> of > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> first > >>>>>>>>>>>>>>>>>>> option > >>>>>>>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>>>>>> Stellar > >>>>>>>>>>>>>>>>>>>>> filters. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> M > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 1:51 PM, > >>>>>>>>> Casey > >>>>>>>>>>>>> Stella < > >>>>>>>>>>>>>>>>>>>> ceste...@gmail.com> > >>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> As of METRON-652 < > >>>>>>>>>>>>> https://github.com/apache/ > >>>>>>>>>>>>>>>>>>>>> incubator-metron/pull/415>, we > >>>>>>>>>>>>>>>>>>>>>> will have decoupled the > >>>>>>>> indexing > >>>>>>>>>>>>>>> configuration > >>>>>>>>>>>>>>>>>> from the > >>>>>>>>>>>>>>>>>>>> enrichment > >>>>>>>>>>>>>>>>>>>>>> configuration. As an immediate > >>>>>>>>>>>> follow-up > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> that, > >>>>>>>>>>>>>>>>>> I'd > >>>>>>>>>>>>>>>>>>> like to > >>>>>>>>>>>>>>>>>>>>> provide the > >>>>>>>>>>>>>>>>>>>>>> ability to turn off and on > >>>>>>>> writers > >>>>>>>>>> via > >>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> configs. I'd > >>>>>>>>>>>>>>>>>>> like > >>>>>>>>>>>>>>>>>>>> to get > >>>>>>>>>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>>>>>>>> community feedback on how the > >>>>>>>>>>>>> functionality > >>>>>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>>> work, > >>>>>>>>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>>>>> y'all are > >>>>>>>>>>>>>>>>>>>>>> amenable. :) > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> As of now, we have 3 possible > >>>>>>>>>> writers > >>>>>>>>>>>>> which > >>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>> used > >>>>>>>>>>>>>>>>>>> in the > >>>>>>>>>>>>>>>>>>>>> indexing > >>>>>>>>>>>>>>>>>>>>>> topology: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> - Solr > >>>>>>>>>>>>>>>>>>>>>> - Elasticsearch > >>>>>>>>>>>>>>>>>>>>>> - HDFS > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> HDFS is always used, > >>>>>>>> elasticsearch > >>>>>>>>>> or > >>>>>>>>>>>>> solr is > >>>>>>>>>>>>>>>>> used > >>>>>>>>>>>>>>>>>>> depending > >>>>>>>>>>>>>>>>>>>> on how > >>>>>>>>>>>>>>>>>>>>> you > >>>>>>>>>>>>>>>>>>>>>> start the indexing topology. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> A couple of proposals come to > >>>>>>>> mind > >>>>>>>>>>>>>>> immediately: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> *Index Filtering* > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> You would be able to specify a > >>>>>>>>>> filter > >>>>>>>>>>> as > >>>>>>>>>>>>>>>> defined > >>>>>>>>>>>>>>>>>> by a > >>>>>>>>>>>>>>>>>>> stellar > >>>>>>>>>>>>>>>>>>>>> statement > >>>>>>>>>>>>>>>>>>>>>> (likely a reuse of the > >>>>>>>>> StellarFilter > >>>>>>>>>>>> that > >>>>>>>>>>>>>>>> exists > >>>>>>>>>>>>>>>>>> in the > >>>>>>>>>>>>>>>>>>>> Parsers) > >>>>>>>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>>>>>>>> would allow you to indicate on > >>>>>>>> a > >>>>>>>>>>>>>>>>>> message-by-message basis > >>>>>>>>>>>>>>>>>>>> whether or > >>>>>>>>>>>>>>>>>>>>> not to > >>>>>>>>>>>>>>>>>>>>>> write the message. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> The semantics of this would be > >>>>>>>> as > >>>>>>>>>>>> follows: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> - Default (i.e. > >>>>>>>> unspecified) is > >>>>>>>>>> to > >>>>>>>>>>>> pass > >>>>>>>>>>>>>>>>>> everything > >>>>>>>>>>>>>>>>>>> through > >>>>>>>>>>>>>>>>>>>> (hence > >>>>>>>>>>>>>>>>>>>>>> backwards compatible with > >>>>>>>> the > >>>>>>>>>>> current > >>>>>>>>>>>>>>>> default > >>>>>>>>>>>>>>>>>> config). > >>>>>>>>>>>>>>>>>>>>>> - Messages which have the > >>>>>>>>>>> associated > >>>>>>>>>>>>>>> stellar > >>>>>>>>>>>>>>>>>> statement > >>>>>>>>>>>>>>>>>>>> evaluate > >>>>>>>>>>>>>>>>>>>>> to true > >>>>>>>>>>>>>>>>>>>>>> for the writer type will be > >>>>>>>>>>> written, > >>>>>>>>>>>>>>>> otherwise > >>>>>>>>>>>>>>>>>> not. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Sample indexing config which > >>>>>>>> would > >>>>>>>>>>> write > >>>>>>>>>>>>> out > >>>>>>>>>>>>>>> no > >>>>>>>>>>>>>>>>>> messages > >>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>> HDFS and > >>>>>>>>>>>>>>>>>>>>> write > >>>>>>>>>>>>>>>>>>>>>> out only messages containing a > >>>>>>>>> field > >>>>>>>>>>>>> called > >>>>>>>>>>>>>>>>>> "field1": > >>>>>>>>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>>>>>>>> "index" : "squid" > >>>>>>>>>>>>>>>>>>>>>> ,"batchSize" : 100 > >>>>>>>>>>>>>>>>>>>>>> ,"filters" : { > >>>>>>>>>>>>>>>>>>>>>> "HDFS" : "false" > >>>>>>>>>>>>>>>>>>>>>> ,"ES" : "exists(field1)" > >>>>>>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> *Index On/Off Switch* > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> A simpler solution would be to > >>>>>>>>> just > >>>>>>>>>>>>> provide a > >>>>>>>>>>>>>>>>> list > >>>>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>> writers > >>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>> write > >>>>>>>>>>>>>>>>>>>>>> messages. The semantics would > >>>>>>>> be > >>>>>>>>> as > >>>>>>>>>>>>> follows: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> - If the list is > >>>>>>>> unspecified, > >>>>>>>>>> then > >>>>>>>>>>>> the > >>>>>>>>>>>>>>>> default > >>>>>>>>>>>>>>>>>> is to > >>>>>>>>>>>>>>>>>>> write > >>>>>>>>>>>>>>>>>>>> all > >>>>>>>>>>>>>>>>>>>>> messages > >>>>>>>>>>>>>>>>>>>>>> for every writer in the > >>>>>>>>> indexing > >>>>>>>>>>>>> topology > >>>>>>>>>>>>>>>>>>>>>> - If the list is specified, > >>>>>>>>> then > >>>>>>>>>> a > >>>>>>>>>>>>> writer > >>>>>>>>>>>>>>>> will > >>>>>>>>>>>>>>>>>> write > >>>>>>>>>>>>>>>>>>> all > >>>>>>>>>>>>>>>>>>>> messages > >>>>>>>>>>>>>>>>>>>>> if and > >>>>>>>>>>>>>>>>>>>>>> only if it is named in the > >>>>>>>>> list. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Sample indexing config which > >>>>>>>> turns > >>>>>>>>>> off > >>>>>>>>>>>>> HDFS > >>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>> keeps on > >>>>>>>>>>>>>>>>>>>>> Elasticsearch: > >>>>>>>>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>>>>>>>> "index" : "squid" > >>>>>>>>>>>>>>>>>>>>>> ,"batchSize" : 100 > >>>>>>>>>>>>>>>>>>>>>> ,"writers" : [ "ES" ] > >>> > >>> -- > >> > >> Jon > >> > >> Sent from my mobile device > >> > > -- Nick Allen <n...@nickallen.org>