I'm +1 on the current proposal. I like Nick's syntax and agree with Jon's enabled property. I also like the idea of a path property for HDFS.
-Kyle > On Jan 14, 2017, at 10:51 AM, Casey Stella <[email protected]> wrote: > > I'm +1 on an explicit enabled property and a filter (or when) property. I > think we are zeroing in on a decent design, so that is good. > > To recap, what I am +1 on is Nick's proposed syntax with the following > modifications: > 1. An explicit enabled field > 2. A default on for unspecified to match current semantics > > Casey >> On Sat, Jan 14, 2017 at 10:45 [email protected] <[email protected]> wrote: >> >> This has the additional benefit of doing something like below when you want >> to temporarily disable the hdfs writer, but don't want to remove the >> settings. This removes the need to store the path and batchSize (and many >> additional settings) somewhere else so they can be brought back in when you >> want to re-enable it, which is a nice workflow attribute for the end user: >> >> { >> 'elasticsearch': { >> 'enabled': 'true', >> 'index': 'foo', >> 'batchSize': 100, >> }, >> 'hdfs': { >> 'enabled': 'false', >> 'path': '/foo/bar/...', >> 'batchSize': 100, >> }, >> 'solr': { >> 'enabled': 'false' >> } >> } >> >> Jon >> >>> On Sat, Jan 14, 2017 at 9:24 AM [email protected] <[email protected]> wrote: >>> >>> I similarly have a concern there because I prefer being as explicit as >>> possible, which makes things easier to pick up for new users. Using my >>> example from earlier this could look like specifying while(false), but an >>> even better and more obvious approach may be to use enabled(false). So >> the >>> current simple default would be: >>> >>> { >>> 'elasticsearch': { 'enabled': 'true' }, >>> 'hdfs': { 'enabled': 'true' }, >>> 'solr': { enabled': 'false' } >>> } >>> >>> And to use ES with some overrides but not HDFS or solr it would look >> like: >>> >>> { >>> 'elasticsearch': { >>> 'enabled': 'true', >>> 'index': 'foo', >>> 'batchSize': 100 >>> }, >>> 'hdfs': { >>> 'enabled': 'false' >>> }, >>> 'solr': { >>> 'enabled': 'false' >>> } >>> } >>> >>> Jon >>> >>> On Fri, Jan 13, 2017 at 10:21 PM Casey Stella <[email protected]> >> wrote: >>> >>> One thing that I thought of that I very strenuous do not like in Nick's >>> proposal is that if a writer config is not specified then it is turned >> off >>> (I think; if I misunderstood let me know). In the situation where we >> have a >>> new sensor, right now if there are no index config and no enrichment >>> config, it still passes through to the index using defaults. In this new >>> scheme it would not. This changes the default semantics for the system >> and >>> I think it changes it for the worse. >>> >>> I would strongly prefer a on-by-default indexing config as we have now. >>>> On Fri, Jan 13, 2017 at 17:13 Casey Stella <[email protected]> wrote: >>>> >>>> One thing that I really like about Nick's suggestion is that it allows >>>> writer-specific configs in a clear and simple way. It is more complex >>> for >>>> the default case (all writers write to indices named the same thing >> with >>> a >>>> fixed batch size), which I do not like, but maybe it's worth the >>> compromise >>>> to make it less complex for the advanced case. >>>> >>>> Thanks a lot for the suggestion, Nick, it's interesting; I'm beginning >>> to >>>> lean your way. >>>> >>>> On Fri, Jan 13, 2017 at 2:51 PM, [email protected] <[email protected]> >>>> wrote: >>>> >>>> I like the suggestions you made, Nick. The only thing I would add is >>> that >>>> it's also nice to see an explicit when(false), as people newer to the >>>> platform may not know where to expect configs for the different >> writers. >>>> Being able to do it either way, which I think is already assumed in >> your >>>> model, would make sense. I would just suggest that, if we support but >>> are >>>> disabling a writer, that the platform inserts a default when(false) to >> be >>>> explicit. >>>> >>>> Jon >>>> >>>> On Fri, Jan 13, 2017 at 11:59 AM Casey Stella <[email protected]> >>> wrote: >>>> >>>>> Let me noodle on this over the weekend. Your syntax is looking less >>>>> onerous to me and I like the following statement from Otto: "In the >>> end, >>>>> each write destination ‘type’ will need it’s own configuration. This >>> is >>>> an >>>>> extension point." >>>>> >>>>> I may come around to your way of thinking. >>>>> >>>>> On Fri, Jan 13, 2017 at 11:57 AM, Otto Fowler < >> [email protected] >>>> >>>>> wrote: >>>>> >>>>>> In the end, each write destination ‘type’ will need it’s own >>>>>> configuration. This is an extension point. >>>>>> { >>>>>> HDFS:{ >>>>>> outputAdapters:[ >>>>>> {name: avro, >>>>>> settings:{ >>>>>> avro stuff…. >>>>>> when:{ >>>>>> }, >>>>>> { >>>>>> name: sequence file, >>>>>> ….. >>>>>> >>>>>> or some such. >>>>>> >>>>>> >>>>>> On January 13, 2017 at 11:51:15, Nick Allen ([email protected]) >>>> wrote: >>>>>> >>>>>> I will add also that instead of global overrides, like index, we >>> should >>>>> use >>>>>> configuration key names that are more appropriate to the output. >>>>>> >>>>>> For example, does 'index' really make sense for HDFS? Or would >> 'path' >>>> be >>>>>> more appropriate? >>>>>> >>>>>> { >>>>>> 'elasticsearch': { >>>>>> 'index': 'foo', >>>>>> 'batchSize': 1 >>>>>> }, >>>>>> 'hdfs': { >>>>>> 'path': '/foo/bar/...', >>>>>> 'batchSize': 100 >>>>>> } >>>>>> } >>>>>> >>>>>> Ok, I've said my peace. Thanks for the effort in summarizing all >>> this, >>>>>> Casey. >>>>>> >>>>>> >>>>>> On Fri, Jan 13, 2017 at 11:42 AM, Nick Allen <[email protected]> >>>> wrote: >>>>>> >>>>>>> Nick's concerns about my suggestion were that it was overly >> complex >>>> and >>>>>>>> hard to grok and that we could dispense with backwards >>> compatibility >>>>> and >>>>>>>> make people do a bit more work on the default case for the >>> benefits >>>>> of a >>>>>>>> simpler advanced case. (Nick, make sure I don't misstate your >>>>> position) >>>>>>> >>>>>>> >>>>>>> I will add is that in my mind, the majority case would be a user >>>>>>> specifying the outputs, but not things like 'batchSize' or >> 'when'. >>> I >>>>>> think >>>>>>> in the majority case, the user would accept whatever the default >>>> batch >>>>>> size >>>>>>> is. >>>>>>> >>>>>>> Here are alternatives suggestions for all the examples that you >>>>> provided >>>>>>> previously. >>>>>>> >>>>>>> Base Case >>>>>>> >>>>>>> - The user must always specify the 'outputs' for clarity. >>>>>>> - Uses default index name, batch size and when = true. >>>>>>> >>>>>>> { >>>>>>> 'elasticsearch': {}, >>>>>>> 'hdfs': {} >>>>>>> } >>>>>>> >>>>>>> >>>>>>> < >>>>>> https://gist.github.com/nickwallen/489735b65cdb38aae6e45cec7633a0 >>>>>> a1#writer-non-specific-case>Writer-non-specific >>>>>> >>>>>>> Case >>>>>>> >>>>>>> - There are no global overrides, as in Casey's proposal. >>>>>>> - Easier to grok IMO. >>>>>>> >>>>>>> { >>>>>>> 'elasticsearch': { >>>>>>> 'index': 'foo', >>>>>>> 'batchSize': 100 >>>>>>> }, >>>>>>> 'hdfs': { >>>>>>> 'index': 'foo', >>>>>>> 'batchSize': 100 >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> < >>>>>> https://gist.github.com/nickwallen/489735b65cdb38aae6e45cec7633a0 >>>>>> a1#writer-specific-case-without-filters>Writer-specific >>>>>> >>>>>>> case without filters >>>>>>> >>>>>>> { >>>>>>> 'elasticsearch': { >>>>>>> 'index': 'foo', >>>>>>> 'batchSize': 1 >>>>>>> }, >>>>>>> 'hdfs': { >>>>>>> 'index': 'foo', >>>>>>> 'batchSize': 100 >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> < >>>>>> https://gist.github.com/nickwallen/489735b65cdb38aae6e45cec7633a0 >>>>>> a1#writer-specific-case-with-filters>Writer-specific >>>>>> >>>>>>> case with filters >>>>>>> >>>>>>> - Instead of having to say when=false, just don't configure HDFS >>>>>>> >>>>>>> { >>>>>>> 'elasticsearch': { >>>>>>> 'index': 'foo', >>>>>>> 'batchSize': 100, >>>>>>> 'when': 'exists(field1)' >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jan 13, 2017 at 11:06 AM, Casey Stella < >> [email protected] >>>> >>>>>> wrote: >>>>>>> >>>>>>>> Dave, >>>>>>>> For the benefit of posterity and people who might not be as >> deeply >>>>>>>> entangled in the system as we have been, I'll recap things and >>>>> hopefully >>>>>>>> answer your question in the process. >>>>>>>> >>>>>>>> Historically the index configuration is split between the >>> enrichment >>>>>>>> configs and the global configs. >>>>>>>> >>>>>>>> - The global configs really controls configs that apply to all >>>>> sensors. >>>>>>>> Historically this has been stuff like index connection strings, >>> etc. >>>>>>>> - The sensor-specific configs which control things that vary by >>>>> sensor. >>>>>>>> >>>>>>>> As of Metron-652 (in review currently), we moved the sensor >>> specific >>>>>>>> configs from the enrichment configs. The proposal here is to >>>> increase >>>>>> the >>>>>>>> granularity of the the sensor specific files to make them >> support >>>>> index >>>>>>>> writer-specific configs. Right now in the indexing topology, we >>>> have 2 >>>>>>>> writers (fixed): ES/Solr and HDFS. >>>>>>>> >>>>>>>> The proposed configuration would allow you to either specify a >>>> blanket >>>>>>>> sensor-level config for the index name and batchSize and/or >>> override >>>>> at >>>>>>>> the >>>>>>>> writer level, thereby supporting a couple of use-cases: >>>>>>>> >>>>>>>> - Turning off certain index writers (e.g. HDFS) >>>>>>>> - Filtering the messages written to certain index writers >>>>>>>> >>>>>>>> The two competing configs between Nick and I are as follows: >>>>>>>> >>>>>>>> - I want to make sure we keep the old sensor-specific defaults >>> with >>>>>>>> writer-specific overrides available >>>>>>>> - Nick thought we could simplify the permutations by making the >>>>>>>> indexing >>>>>>>> config only the writer-level configs. >>>>>>>> >>>>>>>> My concerns about Nick's suggestion were that the default and >>>> majority >>>>>>>> case, specifying the index and the batchSize for all writers (th >>>> eone >>>>> we >>>>>>>> support now) would require more configuration. >>>>>>>> >>>>>>>> Nick's concerns about my suggestion were that it was overly >>> complex >>>>> and >>>>>>>> hard to grok and that we could dispense with backwards >>> compatibility >>>>> and >>>>>>>> make people do a bit more work on the default case for the >>> benefits >>>>> of a >>>>>>>> simpler advanced case. (Nick, make sure I don't misstate your >>>>> position). >>>>>>>> >>>>>>>> Casey >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jan 13, 2017 at 10:54 AM, David Lyle < >>> [email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Casey, >>>>>>>>> >>>>>>>>> Can you give me a level set of what your thinking is now? I >>> think >>>>> it's >>>>>>>>> global control of all index types + overrides on a per-type >>> basis. >>>>>> Fwiw, >>>>>>>>> I'm totally for that, but I want to make sure I'm not imposing >>> my >>>>>>>>> pre-concieved notions on your consensus-driven ones. >>>>>>>>> >>>>>>>>> -D.... >>>>>>>>> >>>>>>>>> On Fri, Jan 13, 2017 at 10:44 AM, Casey Stella < >>>> [email protected]> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I am suggesting that, yes. The configs are essentially the >>> same >>>> as >>>>>>>>> yours, >>>>>>>>>> except there is an override specified at the top level. >>> Without >>>>>>>> that, in >>>>>>>>>> order to specify both HDFS and ES have batch sizes of 100, >> you >>>>> have >>>>>> to >>>>>>>>>> explicitly configure each. It's less that I'm trying to have >>>>>>>> backwards >>>>>>>>>> compatibility and more that I'm trying to make the majority >>> case >>>>>> easy: >>>>>>>>> both >>>>>>>>>> writers write everything to a specified index name with a >>>>> specified >>>>>>>> batch >>>>>>>>>> size (which is what we have now). Beyond that, I want to >> allow >>>> for >>>>>>>>>> specifying an override for the config on a writer-by-writer >>>> basis >>>>>> for >>>>>>>>> those >>>>>>>>>> who need it. >>>>>>>>>> >>>>>>>>>> On Fri, Jan 13, 2017 at 10:39 AM, Nick Allen < >>>> [email protected]> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Are you saying we support all of these variants? I realize >>> you >>>>> are >>>>>>>>>> trying >>>>>>>>>>> to have some backwards compatibility, but this also makes >> it >>>>>> harder >>>>>>>>> for a >>>>>>>>>>> user to grok (for me at least). >>>>>>>>>>> >>>>>>>>>>> Personally I like my original example as there are fewer >>>>>>>>> sub-structures, >>>>>>>>>>> like 'writerConfig', which makes the whole thing simpler >> and >>>>>> easier >>>>>>>> to >>>>>>>>>>> grok. But maybe others will think your proposal is just as >>>> easy >>>>> to >>>>>>>>> grok. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Jan 13, 2017 at 10:01 AM, Casey Stella < >>>>>> [email protected]> >>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Ok, so here's what I'm thinking based on the discussion: >>>>>>>>>>>> >>>>>>>>>>>> - Keeping the configs that we have now (batchSize and >>> index) >>>>> as >>>>>>>>>>> defaults >>>>>>>>>>>> for the unspecified writer-specific case >>>>>>>>>>>> - Adding the config Nick suggested >>>>>>>>>>>> >>>>>>>>>>>> *Base Case*: >>>>>>>>>>>> { >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> - all writers write all messages >>>>>>>>>>>> - index named the same as the sensor for all writers >>>>>>>>>>>> - batchSize of 1 for all writers >>>>>>>>>>>> >>>>>>>>>>>> *Writer-non-specific case*: >>>>>>>>>>>> { >>>>>>>>>>>> "index" : "foo" >>>>>>>>>>>> ,"batchSize" : 100 >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> - All writers write all messages >>>>>>>>>>>> - index is named "foo", different from the sensor for >> all >>>>>>>> writers >>>>>>>>>>>> - batchSize is 100 for all writers >>>>>>>>>>>> >>>>>>>>>>>> *Writer-specific case without filters* >>>>>>>>>>>> { >>>>>>>>>>>> "index" : "foo" >>>>>>>>>>>> ,"batchSize" : 1 >>>>>>>>>>>> , "writerConfig" : >>>>>>>>>>>> { >>>>>>>>>>>> "elasticsearch" : { >>>>>>>>>>>> "batchSize" : 100 >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> - All writers write all messages >>>>>>>>>>>> - index is named "foo", different from the sensor for >> all >>>>>>>> writers >>>>>>>>>>>> - batchSize is 1 for HDFS and 100 for elasticsearch >>> writers >>>>>>>>>>>> - NOTE: I could override the index name too >>>>>>>>>>>> >>>>>>>>>>>> *Writer-specific case with filters* >>>>>>>>>>>> { >>>>>>>>>>>> "index" : "foo" >>>>>>>>>>>> ,"batchSize" : 1 >>>>>>>>>>>> , "writerConfig" : >>>>>>>>>>>> { >>>>>>>>>>>> "elasticsearch" : { >>>>>>>>>>>> "batchSize" : 100, >>>>>>>>>>>> "when" : "exists(field1)" >>>>>>>>>>>> }, >>>>>>>>>>>> "hdfs" : { >>>>>>>>>>>> "when" : "false" >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> - ES writer writes messages which have field1, HDFS >>> doesn't >>>>>>>>>>>> - index is named "foo", different from the sensor for >> all >>>>>>>> writers >>>>>>>>>>>> - 100 for elasticsearch writers >>>>>>>>>>>> >>>>>>>>>>>> Thoughts? >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jan 13, 2017 at 9:44 AM, Carolyn Duby < >>>>>>>> [email protected] >>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> For larger installations you need to control what is >>>> indexed >>>>>> so >>>>>>>> you >>>>>>>>>>> don’t >>>>>>>>>>>>> end up with a nasty elastic search situation and so >> you >>>> can >>>>>> mine >>>>>>>>> the >>>>>>>>>>> data >>>>>>>>>>>>> later for reports and training ml models. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> Carolyn >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 1/13/17, 9:40 AM, "Casey Stella" < >> [email protected] >>>> >>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> OH that's a good idea! >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jan 13, 2017 at 9:39 AM, Nick Allen < >>>>>>>> [email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I like the "Index Filtering" option based on the >>>>>> flexibility >>>>>>>>> that >>>>>>>>>> it >>>>>>>>>>>>>>> provides. Should each output (HDFS, ES, etc) have >> its >>>> own >>>>>>>>>>>> configuration >>>>>>>>>>>>>>> settings? For example, aren't things like batching >>>>> handled >>>>>>>>>>> separately >>>>>>>>>>>>> for >>>>>>>>>>>>>>> HDFS versus Elasticsearch? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Something along the lines of... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> "hdfs" : { >>>>>>>>>>>>>>> "when": "exists(field1)", >>>>>>>>>>>>>>> "batchSize": 100 >>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "elasticsearch" : { >>>>>>>>>>>>>>> "when": "true", >>>>>>>>>>>>>>> "batchSize": 1000, >>>>>>>>>>>>>>> "index": "squid" >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jan 13, 2017 at 9:10 AM, Casey Stella < >>>>>>>>> [email protected] >>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yeah, I tend to like the first option too. Any >>>>> opposition >>>>>>>> to >>>>>>>>>> that >>>>>>>>>>>>> from >>>>>>>>>>>>>>>> anyone? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The points brought up are good ones and I think >>> that >>>> it >>>>>>>> may be >>>>>>>>>>>> worth a >>>>>>>>>>>>>>>> broader discussion of the requirements of >> indexing >>>> in a >>>>>>>>> separate >>>>>>>>>>> dev >>>>>>>>>>>>> list >>>>>>>>>>>>>>>> thread. Maybe a list of desires with coherent >>>> use-cases >>>>>>>>>>> justifying >>>>>>>>>>>>> them >>>>>>>>>>>>>>> so >>>>>>>>>>>>>>>> we can think about how this stuff should work and >>>> where >>>>>> the >>>>>>>>>>> natural >>>>>>>>>>>>>>>> extension points should be. Afterall, we need to >>> toe >>>>> the >>>>>>>> line >>>>>>>>>>>> between >>>>>>>>>>>>>>>> engineering and overengineering for features >> nobody >>>>> will >>>>>>>> want. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm not sure about the extensions to the standard >>>>> fields. >>>>>>>> I'm >>>>>>>>>>> torn >>>>>>>>>>>>>>> between >>>>>>>>>>>>>>>> the notions that we should have no standard >> fields >>> vs >>>>> we >>>>>>>>> should >>>>>>>>>>>> have a >>>>>>>>>>>>>>>> boatload of standard fields (with most of them >>>> empty). >>>>> I >>>>>>>>>> exchange >>>>>>>>>>>>>>>> positions fairly regularly on that question. ;) >> It >>>> may >>>>> be >>>>>>>>>> worth a >>>>>>>>>>>> dev >>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>> discussion to lay out how you imagine an >> extension >>> of >>>>>>>> standard >>>>>>>>>>>> fields >>>>>>>>>>>>> and >>>>>>>>>>>>>>>> how it might look as implemented in Metron. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Casey >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Casey >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 9:58 PM, Kyle Richardson >> < >>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'll second my preference for the first >> option. I >>>>> think >>>>>>>> the >>>>>>>>>>>> ability >>>>>>>>>>>>> to >>>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>> Stellar filters to customize indexing would be >> a >>>> big >>>>>> win. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm glad Matt brought up the point about data >>> lake >>>>> and >>>>>>>> CEP. >>>>>>>>> I >>>>>>>>>>>> think >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>> a really important use case that we need to >>>> consider. >>>>>>>> Take a >>>>>>>>>>>> simple >>>>>>>>>>>>>>>>> example... If I have data coming in from 3 >>>> different >>>>>>>>> firewall >>>>>>>>>>>>> vendors >>>>>>>>>>>>>>>> and 2 >>>>>>>>>>>>>>>>> different web proxy/url filtering vendors and I >>>> want >>>>> to >>>>>>>> be >>>>>>>>>> able >>>>>>>>>>> to >>>>>>>>>>>>>>>> analyze >>>>>>>>>>>>>>>>> that data set, I need the data to be indexed >> all >>>>>> together >>>>>>>>>>> (likely >>>>>>>>>>>> in >>>>>>>>>>>>>>>> HDFS) >>>>>>>>>>>>>>>>> and to have a normalized schema such that IP >>>> address, >>>>>>>> URL, >>>>>>>>> and >>>>>>>>>>>> user >>>>>>>>>>>>>>> name >>>>>>>>>>>>>>>>> (to take a few) can be easily queried and >>>>> aggregated. I >>>>>>>> can >>>>>>>>>> also >>>>>>>>>>>>>>> envision >>>>>>>>>>>>>>>>> scenarios where I would want to index data >> based >>> on >>>>>>>>> attributes >>>>>>>>>>>> other >>>>>>>>>>>>>>> than >>>>>>>>>>>>>>>>> sensor, business unit or subsidiary for >> example. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've been wanted to propose extending our 7 >>>> standard >>>>>>>> fields >>>>>>>>> to >>>>>>>>>>>>> include >>>>>>>>>>>>>>>>> things like URL and user. Is there community >>>>>>>>> interest/support >>>>>>>>>>> for >>>>>>>>>>>>>>> moving >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> that direction? If so, I'll start a new thread. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Kyle >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 6:51 PM, Matt Foley < >>>>>>>>> [email protected] >>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Ah, I see. If overriding the default index >> name >>>>>> allows >>>>>>>>>> using >>>>>>>>>>>> the >>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>> name for multiple sensors, then the goal can >> be >>>>>>>> achieved. >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> --Matt >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 1/12/17, 3:30 PM, "Casey Stella" < >>>>>>>> [email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Oh, you could! Let's say you have a syslog >>> parser >>>>>>>>> with >>>>>>>>>>> data >>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>> sources 1 >>>>>>>>>>>>>>>>>> 2 and 3. You'd end up with one kafka queue >>> with 3 >>>>>>>>>> parsers >>>>>>>>>>>>>>> attached >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>> queue, each picking part the messages from >>> source >>>>>>>> 1, 2 >>>>>>>>>> and >>>>>>>>>>>> 3. >>>>>>>>>>>>>>>> They'd >>>>>>>>>>>>>>>>>> go >>>>>>>>>>>>>>>>>> through separate enrichment and into the >>> indexing >>>>>>>>>>> topology. >>>>>>>>>>>>> In >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> indexing topology, you could specify the same >>>> index >>>>>>>>> name >>>>>>>>>>>>> "syslog" >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>> of the messages go into the same index for >> CEP >>>>>>>>> querying >>>>>>>>>> if >>>>>>>>>>>> so >>>>>>>>>>>>>>>>> desired. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 6:27 PM, Matt Foley < >>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Syslog is hell on parsers – I know, I >> worked >>> at >>>>>>>>>> LogLogic >>>>>>>>>>>> in >>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>> previous >>>>>>>>>>>>>>>>>>> life. It makes perfect sense to route >>> different >>>>>>>>> lines >>>>>>>>>>>> from >>>>>>>>>>>>>>>> syslog >>>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>> different appropriate parsers. But a lot of >>>> what >>>>>>>>> the >>>>>>>>>>>>> parsers >>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>> identify consistent subsets of metadata and >>>>>>>> annotate >>>>>>>>>> it >>>>>>>>>>> – >>>>>>>>>>>>> eg, >>>>>>>>>>>>>>>>>> src_ip_addr, >>>>>>>>>>>>>>>>>>> event timestamps, etc. Once those metadata >>> are >>>>>>>>>>> annotated >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> available >>>>>>>>>>>>>>>>>>> with common field names, why doesn’t it >> make >>>>>>>> sense >>>>>>>>> to >>>>>>>>>>>> index >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> messages >>>>>>>>>>>>>>>>>>> together, for CEP querying? I think Splunk >>> has >>>>>>>>>>>> illustrated >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>> model. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 1/12/17, 3:00 PM, "Casey Stella" < >>>>>>>>>> [email protected] >>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> yeah, I mean, honestly, I think the >> approach >>>>>>>>> that >>>>>>>>>>>> we've >>>>>>>>>>>>>>> taken >>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> sources >>>>>>>>>>>>>>>>>>> which aggregate different types of data is >> to >>>>>>>>>>> provide >>>>>>>>>>>>>>> filters >>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> parser >>>>>>>>>>>>>>>>>>> level and have multiple parser topologies >>>>>>>> (with >>>>>>>>>>>>> different, >>>>>>>>>>>>>>>>>> possibly >>>>>>>>>>>>>>>>>>> mutually exclusive filters) running. This >>>>>>>> would >>>>>>>>>> be >>>>>>>>>>> a >>>>>>>>>>>>>>>>> completely >>>>>>>>>>>>>>>>>>> separate >>>>>>>>>>>>>>>>>>> sensor. Imagine a syslog data source that >>>>>>>>>>> aggregates >>>>>>>>>>>>> and >>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>>> want to >>>>>>>>>>>>>>>>>>> pick >>>>>>>>>>>>>>>>>>> apart certain pieces of messages. This is >>>>>>>> why >>>>>>>>> the >>>>>>>>>>>>> initial >>>>>>>>>>>>>>>>>> thought and >>>>>>>>>>>>>>>>>>> architecture was one index per sensor. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 5:55 PM, Matt >> Foley < >>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I’m thinking that CEP (Complex Event >>>>>>>>> Processing) >>>>>>>>>>> is >>>>>>>>>>>>>>>> contrary >>>>>>>>>>>>>>>>>> to the >>>>>>>>>>>>>>>>>>> idea >>>>>>>>>>>>>>>>>>>> of silo-ing data per sensor. >>>>>>>>>>>>>>>>>>>> Now it’s true that some of those sensors >>>>>>>> are >>>>>>>>>>> already >>>>>>>>>>>>>>>>>> aggregating >>>>>>>>>>>>>>>>>>> data from >>>>>>>>>>>>>>>>>>>> multiple sources, so maybe I’m wrong >> here. >>>>>>>>>>>>>>>>>>>> But it just seems to me that the “data >>>>>>>> lake” >>>>>>>>>>>> insights >>>>>>>>>>>>>>> come >>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>> being able >>>>>>>>>>>>>>>>>>>> to make decisions over the whole mass of >>>>>>>> data >>>>>>>>>>> rather >>>>>>>>>>>>> than >>>>>>>>>>>>>>>>> just >>>>>>>>>>>>>>>>>>> vertical >>>>>>>>>>>>>>>>>>>> slices of it. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 1/12/17, 2:15 PM, "Casey Stella" < >>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hey Matt, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks for the comment! >>>>>>>>>>>>>>>>>>>> 1. At the moment, we only have one >>>>>>>> index >>>>>>>>>> name, >>>>>>>>>>>> the >>>>>>>>>>>>>>>>> default >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> which is >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> sensor name but that's entirely up to >>>>>>>> the >>>>>>>>>>> user. >>>>>>>>>>>>> This >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>> sensor >>>>>>>>>>>>>>>>>>>> specific, >>>>>>>>>>>>>>>>>>>> so it'd be a separate config for each >>>>>>>>>> sensor. >>>>>>>>>>>> If >>>>>>>>>>>>> we >>>>>>>>>>>>>>>> want >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> build >>>>>>>>>>>>>>>>>>>> multiple >>>>>>>>>>>>>>>>>>>> indices per sensor, we'd have to think >>>>>>>>>>> carefully >>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>> how >>>>>>>>>>>>>>>>>> to do >>>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>> would be a bigger undertaking. I >>>>>>>> guess I >>>>>>>>>> can >>>>>>>>>>>> see >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> use, >>>>>>>>>>>>>>>>>> though >>>>>>>>>>>>>>>>>>>> (redirect >>>>>>>>>>>>>>>>>>>> messages to one index vs another based >>>>>>>> on >>>>>>>>> a >>>>>>>>>>>>> predicate >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> a given >>>>>>>>>>>>>>>>>>>> sensor). >>>>>>>>>>>>>>>>>>>> Anyway, not where I was originally >>>>>>>>> thinking >>>>>>>>>>> that >>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>> discussion >>>>>>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>>>>>> go, >>>>>>>>>>>>>>>>>>>> but it's an interesting point. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2. I hadn't thought through the >>>>>>>>>> implementation >>>>>>>>>>>>> quite >>>>>>>>>>>>>>>> yet, >>>>>>>>>>>>>>>>>> but we >>>>>>>>>>>>>>>>>>> don't >>>>>>>>>>>>>>>>>>>> actually have a splitter bolt in that >>>>>>>>>>> topology, >>>>>>>>>>>>> just >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>> spout >>>>>>>>>>>>>>>>>>> that goes >>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> the elasticsearch writer and also to >>>>>>>> the >>>>>>>>>> hdfs >>>>>>>>>>>>> writer. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 4:52 PM, Matt >>>>>>>>> Foley >>>>>>>>>> < >>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Casey, good to have controls like >>>>>>>> this. >>>>>>>>>>>> Couple >>>>>>>>>>>>>>>>>> questions: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1. Regarding the “index” : “squid” >>>>>>>>>>> name/value >>>>>>>>>>>>> pair, >>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> index name >>>>>>>>>>>>>>>>>>>>> expected to always be a sensor >>>>>>>> name? Or >>>>>>>>>> is >>>>>>>>>>>> the >>>>>>>>>>>>>>> given >>>>>>>>>>>>>>>>>> json >>>>>>>>>>>>>>>>>>> structure >>>>>>>>>>>>>>>>>>>>> subordinate to a sensor name in >>>>>>>>> zookeeper? >>>>>>>>>>> Or >>>>>>>>>>>>> can >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>> build >>>>>>>>>>>>>>>>>>> arbitrary >>>>>>>>>>>>>>>>>>>>> indexes with this new specification, >>>>>>>>>>>>> independent of >>>>>>>>>>>>>>>>>> sensor? >>>>>>>>>>>>>>>>>>> Should >>>>>>>>>>>>>>>>>>>> there >>>>>>>>>>>>>>>>>>>>> actually be a list of “indexes”, ie >>>>>>>>>>>>>>>>>>>>> { “indexes” : [ >>>>>>>>>>>>>>>>>>>>> {“index” : “name1”, >>>>>>>>>>>>>>>>>>>>> … >>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>> {“index” : “name2”, >>>>>>>>>>>>>>>>>>>>> … >>>>>>>>>>>>>>>>>>>>> } ] >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2. Would the filtering / writer >>>>>>>>> selection >>>>>>>>>>>> logic >>>>>>>>>>>>>>> take >>>>>>>>>>>>>>>>>> place in >>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> indexing >>>>>>>>>>>>>>>>>>>>> topology splitter bolt? Seems like >>>>>>>> that >>>>>>>>>>> would >>>>>>>>>>>>> have >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> smallest >>>>>>>>>>>>>>>>>>>> impact on >>>>>>>>>>>>>>>>>>>>> current implementation, no? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Sorry if these are already answered >>>>>>>> in >>>>>>>>>>>> PR-415, I >>>>>>>>>>>>>>>>> haven’t >>>>>>>>>>>>>>>>>> had >>>>>>>>>>>>>>>>>>> time to >>>>>>>>>>>>>>>>>>>>> review that one yet. >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> --Matt >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 1/12/17, 12:55 PM, "Michael >>>>>>>>> Miklavcic" >>>>>>>>>> < >>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I like the flexibility and >>>>>>>>>>> expressibility >>>>>>>>>>>> of >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> first >>>>>>>>>>>>>>>>>>> option >>>>>>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>> Stellar >>>>>>>>>>>>>>>>>>>>> filters. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> M >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 1:51 PM, >>>>>>>>> Casey >>>>>>>>>>>>> Stella < >>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> As of METRON-652 < >>>>>>>>>>>>> https://github.com/apache/ >>>>>>>>>>>>>>>>>>>>> incubator-metron/pull/415>, we >>>>>>>>>>>>>>>>>>>>>> will have decoupled the >>>>>>>> indexing >>>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>>> from the >>>>>>>>>>>>>>>>>>>> enrichment >>>>>>>>>>>>>>>>>>>>>> configuration. As an immediate >>>>>>>>>>>> follow-up >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> that, >>>>>>>>>>>>>>>>>> I'd >>>>>>>>>>>>>>>>>>> like to >>>>>>>>>>>>>>>>>>>>> provide the >>>>>>>>>>>>>>>>>>>>>> ability to turn off and on >>>>>>>> writers >>>>>>>>>> via >>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> configs. I'd >>>>>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>> to get >>>>>>>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>>>>> community feedback on how the >>>>>>>>>>>>> functionality >>>>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>> work, >>>>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>> y'all are >>>>>>>>>>>>>>>>>>>>>> amenable. :) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> As of now, we have 3 possible >>>>>>>>>> writers >>>>>>>>>>>>> which >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>> used >>>>>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>>>>> indexing >>>>>>>>>>>>>>>>>>>>>> topology: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> - Solr >>>>>>>>>>>>>>>>>>>>>> - Elasticsearch >>>>>>>>>>>>>>>>>>>>>> - HDFS >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> HDFS is always used, >>>>>>>> elasticsearch >>>>>>>>>> or >>>>>>>>>>>>> solr is >>>>>>>>>>>>>>>>> used >>>>>>>>>>>>>>>>>>> depending >>>>>>>>>>>>>>>>>>>> on how >>>>>>>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>>>>>>> start the indexing topology. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> A couple of proposals come to >>>>>>>> mind >>>>>>>>>>>>>>> immediately: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> *Index Filtering* >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> You would be able to specify a >>>>>>>>>> filter >>>>>>>>>>> as >>>>>>>>>>>>>>>> defined >>>>>>>>>>>>>>>>>> by a >>>>>>>>>>>>>>>>>>> stellar >>>>>>>>>>>>>>>>>>>>> statement >>>>>>>>>>>>>>>>>>>>>> (likely a reuse of the >>>>>>>>> StellarFilter >>>>>>>>>>>> that >>>>>>>>>>>>>>>> exists >>>>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>>>>> Parsers) >>>>>>>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>> would allow you to indicate on >>>>>>>> a >>>>>>>>>>>>>>>>>> message-by-message basis >>>>>>>>>>>>>>>>>>>> whether or >>>>>>>>>>>>>>>>>>>>> not to >>>>>>>>>>>>>>>>>>>>>> write the message. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The semantics of this would be >>>>>>>> as >>>>>>>>>>>> follows: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> - Default (i.e. >>>>>>>> unspecified) is >>>>>>>>>> to >>>>>>>>>>>> pass >>>>>>>>>>>>>>>>>> everything >>>>>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>>> (hence >>>>>>>>>>>>>>>>>>>>>> backwards compatible with >>>>>>>> the >>>>>>>>>>> current >>>>>>>>>>>>>>>> default >>>>>>>>>>>>>>>>>> config). >>>>>>>>>>>>>>>>>>>>>> - Messages which have the >>>>>>>>>>> associated >>>>>>>>>>>>>>> stellar >>>>>>>>>>>>>>>>>> statement >>>>>>>>>>>>>>>>>>>> evaluate >>>>>>>>>>>>>>>>>>>>> to true >>>>>>>>>>>>>>>>>>>>>> for the writer type will be >>>>>>>>>>> written, >>>>>>>>>>>>>>>> otherwise >>>>>>>>>>>>>>>>>> not. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Sample indexing config which >>>>>>>> would >>>>>>>>>>> write >>>>>>>>>>>>> out >>>>>>>>>>>>>>> no >>>>>>>>>>>>>>>>>> messages >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> HDFS and >>>>>>>>>>>>>>>>>>>>> write >>>>>>>>>>>>>>>>>>>>>> out only messages containing a >>>>>>>>> field >>>>>>>>>>>>> called >>>>>>>>>>>>>>>>>> "field1": >>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> "index" : "squid" >>>>>>>>>>>>>>>>>>>>>> ,"batchSize" : 100 >>>>>>>>>>>>>>>>>>>>>> ,"filters" : { >>>>>>>>>>>>>>>>>>>>>> "HDFS" : "false" >>>>>>>>>>>>>>>>>>>>>> ,"ES" : "exists(field1)" >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> *Index On/Off Switch* >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> A simpler solution would be to >>>>>>>>> just >>>>>>>>>>>>> provide a >>>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> writers >>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> write >>>>>>>>>>>>>>>>>>>>>> messages. The semantics would >>>>>>>> be >>>>>>>>> as >>>>>>>>>>>>> follows: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> - If the list is >>>>>>>> unspecified, >>>>>>>>>> then >>>>>>>>>>>> the >>>>>>>>>>>>>>>> default >>>>>>>>>>>>>>>>>> is to >>>>>>>>>>>>>>>>>>> write >>>>>>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>>>>> messages >>>>>>>>>>>>>>>>>>>>>> for every writer in the >>>>>>>>> indexing >>>>>>>>>>>>> topology >>>>>>>>>>>>>>>>>>>>>> - If the list is specified, >>>>>>>>> then >>>>>>>>>> a >>>>>>>>>>>>> writer >>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>> write >>>>>>>>>>>>>>>>>>> all >>>>>>>>>>>>>>>>>>>> messages >>>>>>>>>>>>>>>>>>>>> if and >>>>>>>>>>>>>>>>>>>>>> only if it is named in the >>>>>>>>> list. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Sample indexing config which >>>>>>>> turns >>>>>>>>>> off >>>>>>>>>>>>> HDFS >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> keeps on >>>>>>>>>>>>>>>>>>>>> Elasticsearch: >>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> "index" : "squid" >>>>>>>>>>>>>>>>>>>>>> ,"batchSize" : 100 >>>>>>>>>>>>>>>>>>>>>> ,"writers" : [ "ES" ] >>> >>> -- >> >> Jon >> >> Sent from my mobile device >>
