The explicit on/off seems like a good option to have. This way I don't have to completely remove the config block in order for me to test something. I think if the config for the writer is unspecified we should throw up a warning.
16.01.2017, 08:54, "Nick Allen" <[email protected]>: >> To recap, what I am +1 on is Nick's proposed syntax with the following >> modifications: >> 1. An explicit enabled field >> 2. A default on for unspecified to match current semantics > > I'm +1 on all of this. > > On Sat, Jan 14, 2017 at 10:51 AM, Casey Stella <[email protected]> wrote: > >> I'm +1 on an explicit enabled property and a filter (or when) property. I >> think we are zeroing in on a decent design, so that is good. >> >> To recap, what I am +1 on is Nick's proposed syntax with the following >> modifications: >> 1. An explicit enabled field >> 2. A default on for unspecified to match current semantics >> >> Casey >> On Sat, Jan 14, 2017 at 10:45 [email protected] <[email protected]> wrote: >> >> > This has the additional benefit of doing something like below when you >> want >> > to temporarily disable the hdfs writer, but don't want to remove the >> > settings. This removes the need to store the path and batchSize (and >> many >> > additional settings) somewhere else so they can be brought back in when >> you >> > want to re-enable it, which is a nice workflow attribute for the end >> user: >> > >> > { >> > 'elasticsearch': { >> > 'enabled': 'true', >> > 'index': 'foo', >> > 'batchSize': 100, >> > }, >> > 'hdfs': { >> > 'enabled': 'false', >> > 'path': '/foo/bar/...', >> > 'batchSize': 100, >> > }, >> > 'solr': { >> > 'enabled': 'false' >> > } >> > } >> > >> > Jon >> > >> > On Sat, Jan 14, 2017 at 9:24 AM [email protected] <[email protected]> >> wrote: >> > >> > > I similarly have a concern there because I prefer being as explicit as >> > > possible, which makes things easier to pick up for new users. Using my >> > > example from earlier this could look like specifying while(false), but >> an >> > > even better and more obvious approach may be to use enabled(false). So >> > the >> > > current simple default would be: >> > > >> > > { >> > > 'elasticsearch': { 'enabled': 'true' }, >> > > 'hdfs': { 'enabled': 'true' }, >> > > 'solr': { enabled': 'false' } >> > > } >> > > >> > > And to use ES with some overrides but not HDFS or solr it would look >> > like: >> > > >> > > { >> > > 'elasticsearch': { >> > > 'enabled': 'true', >> > > 'index': 'foo', >> > > 'batchSize': 100 >> > > }, >> > > 'hdfs': { >> > > 'enabled': 'false' >> > > }, >> > > 'solr': { >> > > 'enabled': 'false' >> > > } >> > > } >> > > >> > > Jon >> > > >> > > On Fri, Jan 13, 2017 at 10:21 PM Casey Stella <[email protected]> >> > wrote: >> > > >> > > One thing that I thought of that I very strenuous do not like in Nick's >> > > proposal is that if a writer config is not specified then it is turned >> > off >> > > (I think; if I misunderstood let me know). In the situation where we >> > have a >> > > new sensor, right now if there are no index config and no enrichment >> > > config, it still passes through to the index using defaults. In this >> new >> > > scheme it would not. This changes the default semantics for the system >> > and >> > > I think it changes it for the worse. >> > > >> > > I would strongly prefer a on-by-default indexing config as we have now. >> > > On Fri, Jan 13, 2017 at 17:13 Casey Stella <[email protected]> wrote: >> > > >> > > > One thing that I really like about Nick's suggestion is that it >> allows >> > > > writer-specific configs in a clear and simple way. It is more >> complex >> > > for >> > > > the default case (all writers write to indices named the same thing >> > with >> > > a >> > > > fixed batch size), which I do not like, but maybe it's worth the >> > > compromise >> > > > to make it less complex for the advanced case. >> > > > >> > > > Thanks a lot for the suggestion, Nick, it's interesting; I'm >> beginning >> > > to >> > > > lean your way. >> > > > >> > > > On Fri, Jan 13, 2017 at 2:51 PM, [email protected] <[email protected]> >> > > > wrote: >> > > > >> > > > I like the suggestions you made, Nick. The only thing I would add is >> > > that >> > > > it's also nice to see an explicit when(false), as people newer to the >> > > > platform may not know where to expect configs for the different >> > writers. >> > > > Being able to do it either way, which I think is already assumed in >> > your >> > > > model, would make sense. I would just suggest that, if we support >> but >> > > are >> > > > disabling a writer, that the platform inserts a default when(false) >> to >> > be >> > > > explicit. >> > > > >> > > > Jon >> > > > >> > > > On Fri, Jan 13, 2017 at 11:59 AM Casey Stella <[email protected]> >> > > wrote: >> > > > >> > > > > Let me noodle on this over the weekend. Your syntax is looking >> less >> > > > > onerous to me and I like the following statement from Otto: "In the >> > > end, >> > > > > each write destination ‘type’ will need it’s own configuration. >> This >> > > is >> > > > an >> > > > > extension point." >> > > > > >> > > > > I may come around to your way of thinking. >> > > > > >> > > > > On Fri, Jan 13, 2017 at 11:57 AM, Otto Fowler < >> > [email protected] >> > > > >> > > > > wrote: >> > > > > >> > > > > > In the end, each write destination ‘type’ will need it’s own >> > > > > > configuration. This is an extension point. >> > > > > > { >> > > > > > HDFS:{ >> > > > > > outputAdapters:[ >> > > > > > {name: avro, >> > > > > > settings:{ >> > > > > > avro stuff…. >> > > > > > when:{ >> > > > > > }, >> > > > > > { >> > > > > > name: sequence file, >> > > > > > ….. >> > > > > > >> > > > > > or some such. >> > > > > > >> > > > > > >> > > > > > On January 13, 2017 at 11:51:15, Nick Allen ([email protected]) >> > > > wrote: >> > > > > > >> > > > > > I will add also that instead of global overrides, like index, we >> > > should >> > > > > use >> > > > > > configuration key names that are more appropriate to the output. >> > > > > > >> > > > > > For example, does 'index' really make sense for HDFS? Or would >> > 'path' >> > > > be >> > > > > > more appropriate? >> > > > > > >> > > > > > { >> > > > > > 'elasticsearch': { >> > > > > > 'index': 'foo', >> > > > > > 'batchSize': 1 >> > > > > > }, >> > > > > > 'hdfs': { >> > > > > > 'path': '/foo/bar/...', >> > > > > > 'batchSize': 100 >> > > > > > } >> > > > > > } >> > > > > > >> > > > > > Ok, I've said my peace. Thanks for the effort in summarizing all >> > > this, >> > > > > > Casey. >> > > > > > >> > > > > > >> > > > > > On Fri, Jan 13, 2017 at 11:42 AM, Nick Allen <[email protected] >> > >> > > > wrote: >> > > > > > >> > > > > > > Nick's concerns about my suggestion were that it was overly >> > complex >> > > > and >> > > > > > >> hard to grok and that we could dispense with backwards >> > > compatibility >> > > > > and >> > > > > > >> make people do a bit more work on the default case for the >> > > benefits >> > > > > of a >> > > > > > >> simpler advanced case. (Nick, make sure I don't misstate your >> > > > > position) >> > > > > > > >> > > > > > > >> > > > > > > I will add is that in my mind, the majority case would be a >> user >> > > > > > > specifying the outputs, but not things like 'batchSize' or >> > 'when'. >> > > I >> > > > > > think >> > > > > > > in the majority case, the user would accept whatever the >> default >> > > > batch >> > > > > > size >> > > > > > > is. >> > > > > > > >> > > > > > > Here are alternatives suggestions for all the examples that you >> > > > > provided >> > > > > > > previously. >> > > > > > > >> > > > > > > Base Case >> > > > > > > >> > > > > > > - The user must always specify the 'outputs' for clarity. >> > > > > > > - Uses default index name, batch size and when = true. >> > > > > > > >> > > > > > > { >> > > > > > > 'elasticsearch': {}, >> > > > > > > 'hdfs': {} >> > > > > > > } >> > > > > > > >> > > > > > > >> > > > > > > < >> > > > > > https://gist.github.com/nickwallen/ >> 489735b65cdb38aae6e45cec7633a0 >> > > > > > a1#writer-non-specific-case>Writer-non-specific >> > > > > > >> > > > > > > Case >> > > > > > > >> > > > > > > - There are no global overrides, as in Casey's proposal. >> > > > > > > - Easier to grok IMO. >> > > > > > > >> > > > > > > { >> > > > > > > 'elasticsearch': { >> > > > > > > 'index': 'foo', >> > > > > > > 'batchSize': 100 >> > > > > > > }, >> > > > > > > 'hdfs': { >> > > > > > > 'index': 'foo', >> > > > > > > 'batchSize': 100 >> > > > > > > } >> > > > > > > } >> > > > > > > >> > > > > > > >> > > > > > > < >> > > > > > https://gist.github.com/nickwallen/ >> 489735b65cdb38aae6e45cec7633a0 >> > > > > > a1#writer-specific-case-without-filters>Writer-specific >> > > > > > >> > > > > > > case without filters >> > > > > > > >> > > > > > > { >> > > > > > > 'elasticsearch': { >> > > > > > > 'index': 'foo', >> > > > > > > 'batchSize': 1 >> > > > > > > }, >> > > > > > > 'hdfs': { >> > > > > > > 'index': 'foo', >> > > > > > > 'batchSize': 100 >> > > > > > > } >> > > > > > > } >> > > > > > > >> > > > > > > >> > > > > > > < >> > > > > > https://gist.github.com/nickwallen/ >> 489735b65cdb38aae6e45cec7633a0 >> > > > > > a1#writer-specific-case-with-filters>Writer-specific >> > > > > > >> > > > > > > case with filters >> > > > > > > >> > > > > > > - Instead of having to say when=false, just don't configure >> HDFS >> > > > > > > >> > > > > > > { >> > > > > > > 'elasticsearch': { >> > > > > > > 'index': 'foo', >> > > > > > > 'batchSize': 100, >> > > > > > > 'when': 'exists(field1)' >> > > > > > > } >> > > > > > > } >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > On Fri, Jan 13, 2017 at 11:06 AM, Casey Stella < >> > [email protected] >> > > > >> > > > > > wrote: >> > > > > > > >> > > > > > >> Dave, >> > > > > > >> For the benefit of posterity and people who might not be as >> > deeply >> > > > > > >> entangled in the system as we have been, I'll recap things and >> > > > > hopefully >> > > > > > >> answer your question in the process. >> > > > > > >> >> > > > > > >> Historically the index configuration is split between the >> > > enrichment >> > > > > > >> configs and the global configs. >> > > > > > >> >> > > > > > >> - The global configs really controls configs that apply to all >> > > > > sensors. >> > > > > > >> Historically this has been stuff like index connection >> strings, >> > > etc. >> > > > > > >> - The sensor-specific configs which control things that vary >> by >> > > > > sensor. >> > > > > > >> >> > > > > > >> As of Metron-652 (in review currently), we moved the sensor >> > > specific >> > > > > > >> configs from the enrichment configs. The proposal here is to >> > > > increase >> > > > > > the >> > > > > > >> granularity of the the sensor specific files to make them >> > support >> > > > > index >> > > > > > >> writer-specific configs. Right now in the indexing topology, >> we >> > > > have 2 >> > > > > > >> writers (fixed): ES/Solr and HDFS. >> > > > > > >> >> > > > > > >> The proposed configuration would allow you to either specify a >> > > > blanket >> > > > > > >> sensor-level config for the index name and batchSize and/or >> > > override >> > > > > at >> > > > > > >> the >> > > > > > >> writer level, thereby supporting a couple of use-cases: >> > > > > > >> >> > > > > > >> - Turning off certain index writers (e.g. HDFS) >> > > > > > >> - Filtering the messages written to certain index writers >> > > > > > >> >> > > > > > >> The two competing configs between Nick and I are as follows: >> > > > > > >> >> > > > > > >> - I want to make sure we keep the old sensor-specific defaults >> > > with >> > > > > > >> writer-specific overrides available >> > > > > > >> - Nick thought we could simplify the permutations by making >> the >> > > > > > >> indexing >> > > > > > >> config only the writer-level configs. >> > > > > > >> >> > > > > > >> My concerns about Nick's suggestion were that the default and >> > > > majority >> > > > > > >> case, specifying the index and the batchSize for all writers >> (th >> > > > eone >> > > > > we >> > > > > > >> support now) would require more configuration. >> > > > > > >> >> > > > > > >> Nick's concerns about my suggestion were that it was overly >> > > complex >> > > > > and >> > > > > > >> hard to grok and that we could dispense with backwards >> > > compatibility >> > > > > and >> > > > > > >> make people do a bit more work on the default case for the >> > > benefits >> > > > > of a >> > > > > > >> simpler advanced case. (Nick, make sure I don't misstate your >> > > > > position). >> > > > > > >> >> > > > > > >> Casey >> > > > > > >> >> > > > > > >> >> > > > > > >> On Fri, Jan 13, 2017 at 10:54 AM, David Lyle < >> > > [email protected]> >> > > > > > >> wrote: >> > > > > > >> >> > > > > > >> > Casey, >> > > > > > >> > >> > > > > > >> > Can you give me a level set of what your thinking is now? I >> > > think >> > > > > it's >> > > > > > >> > global control of all index types + overrides on a per-type >> > > basis. >> > > > > > Fwiw, >> > > > > > >> > I'm totally for that, but I want to make sure I'm not >> imposing >> > > my >> > > > > > >> > pre-concieved notions on your consensus-driven ones. >> > > > > > >> > >> > > > > > >> > -D.... >> > > > > > >> > >> > > > > > >> > On Fri, Jan 13, 2017 at 10:44 AM, Casey Stella < >> > > > [email protected]> >> > > > > > >> wrote: >> > > > > > >> > >> > > > > > >> > > I am suggesting that, yes. The configs are essentially the >> > > same >> > > > as >> > > > > > >> > yours, >> > > > > > >> > > except there is an override specified at the top level. >> > > Without >> > > > > > >> that, in >> > > > > > >> > > order to specify both HDFS and ES have batch sizes of 100, >> > you >> > > > > have >> > > > > > to >> > > > > > >> > > explicitly configure each. It's less that I'm trying to >> have >> > > > > > >> backwards >> > > > > > >> > > compatibility and more that I'm trying to make the >> majority >> > > case >> > > > > > easy: >> > > > > > >> > both >> > > > > > >> > > writers write everything to a specified index name with a >> > > > > specified >> > > > > > >> batch >> > > > > > >> > > size (which is what we have now). Beyond that, I want to >> > allow >> > > > for >> > > > > > >> > > specifying an override for the config on a >> writer-by-writer >> > > > basis >> > > > > > for >> > > > > > >> > those >> > > > > > >> > > who need it. >> > > > > > >> > > >> > > > > > >> > > On Fri, Jan 13, 2017 at 10:39 AM, Nick Allen < >> > > > [email protected]> >> > > > > > >> wrote: >> > > > > > >> > > >> > > > > > >> > > > Are you saying we support all of these variants? I >> realize >> > > you >> > > > > are >> > > > > > >> > > trying >> > > > > > >> > > > to have some backwards compatibility, but this also >> makes >> > it >> > > > > > harder >> > > > > > >> > for a >> > > > > > >> > > > user to grok (for me at least). >> > > > > > >> > > > >> > > > > > >> > > > Personally I like my original example as there are fewer >> > > > > > >> > sub-structures, >> > > > > > >> > > > like 'writerConfig', which makes the whole thing simpler >> > and >> > > > > > easier >> > > > > > >> to >> > > > > > >> > > > grok. But maybe others will think your proposal is just >> as >> > > > easy >> > > > > to >> > > > > > >> > grok. >> > > > > > >> > > > >> > > > > > >> > > > >> > > > > > >> > > > >> > > > > > >> > > > On Fri, Jan 13, 2017 at 10:01 AM, Casey Stella < >> > > > > > [email protected]> >> > > > > > >> > > > > > >> > > wrote: >> > > > > > >> > > > >> > > > > > >> > > > > Ok, so here's what I'm thinking based on the >> discussion: >> > > > > > >> > > > > >> > > > > > >> > > > > - Keeping the configs that we have now (batchSize and >> > > index) >> > > > > as >> > > > > > >> > > > defaults >> > > > > > >> > > > > for the unspecified writer-specific case >> > > > > > >> > > > > - Adding the config Nick suggested >> > > > > > >> > > > > >> > > > > > >> > > > > *Base Case*: >> > > > > > >> > > > > { >> > > > > > >> > > > > } >> > > > > > >> > > > > >> > > > > > >> > > > > - all writers write all messages >> > > > > > >> > > > > - index named the same as the sensor for all writers >> > > > > > >> > > > > - batchSize of 1 for all writers >> > > > > > >> > > > > >> > > > > > >> > > > > *Writer-non-specific case*: >> > > > > > >> > > > > { >> > > > > > >> > > > > "index" : "foo" >> > > > > > >> > > > > ,"batchSize" : 100 >> > > > > > >> > > > > } >> > > > > > >> > > > > >> > > > > > >> > > > > - All writers write all messages >> > > > > > >> > > > > - index is named "foo", different from the sensor for >> > all >> > > > > > >> writers >> > > > > > >> > > > > - batchSize is 100 for all writers >> > > > > > >> > > > > >> > > > > > >> > > > > *Writer-specific case without filters* >> > > > > > >> > > > > { >> > > > > > >> > > > > "index" : "foo" >> > > > > > >> > > > > ,"batchSize" : 1 >> > > > > > >> > > > > , "writerConfig" : >> > > > > > >> > > > > { >> > > > > > >> > > > > "elasticsearch" : { >> > > > > > >> > > > > "batchSize" : 100 >> > > > > > >> > > > > } >> > > > > > >> > > > > } >> > > > > > >> > > > > } >> > > > > > >> > > > > >> > > > > > >> > > > > - All writers write all messages >> > > > > > >> > > > > - index is named "foo", different from the sensor for >> > all >> > > > > > >> writers >> > > > > > >> > > > > - batchSize is 1 for HDFS and 100 for elasticsearch >> > > writers >> > > > > > >> > > > > - NOTE: I could override the index name too >> > > > > > >> > > > > >> > > > > > >> > > > > *Writer-specific case with filters* >> > > > > > >> > > > > { >> > > > > > >> > > > > "index" : "foo" >> > > > > > >> > > > > ,"batchSize" : 1 >> > > > > > >> > > > > , "writerConfig" : >> > > > > > >> > > > > { >> > > > > > >> > > > > "elasticsearch" : { >> > > > > > >> > > > > "batchSize" : 100, >> > > > > > >> > > > > "when" : "exists(field1)" >> > > > > > >> > > > > }, >> > > > > > >> > > > > "hdfs" : { >> > > > > > >> > > > > "when" : "false" >> > > > > > >> > > > > } >> > > > > > >> > > > > } >> > > > > > >> > > > > } >> > > > > > >> > > > > >> > > > > > >> > > > > - ES writer writes messages which have field1, HDFS >> > > doesn't >> > > > > > >> > > > > - index is named "foo", different from the sensor for >> > all >> > > > > > >> writers >> > > > > > >> > > > > - 100 for elasticsearch writers >> > > > > > >> > > > > >> > > > > > >> > > > > Thoughts? >> > > > > > >> > > > > >> > > > > > >> > > > > On Fri, Jan 13, 2017 at 9:44 AM, Carolyn Duby < >> > > > > > >> [email protected] >> > > > > > >> > > >> > > > > > >> > > > > wrote: >> > > > > > >> > > > > >> > > > > > >> > > > > > For larger installations you need to control what is >> > > > indexed >> > > > > > so >> > > > > > >> you >> > > > > > >> > > > don’t >> > > > > > >> > > > > > end up with a nasty elastic search situation and so >> > you >> > > > can >> > > > > > mine >> > > > > > >> > the >> > > > > > >> > > > data >> > > > > > >> > > > > > later for reports and training ml models. >> > > > > > >> > > > > > >> > > > > > >> > > > > > Thanks >> > > > > > >> > > > > > Carolyn >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > On 1/13/17, 9:40 AM, "Casey Stella" < >> > [email protected] >> > > > >> > > > > > wrote: >> > > > > > >> > > > > > >> > > > > > >> > > > > > >OH that's a good idea! >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >On Fri, Jan 13, 2017 at 9:39 AM, Nick Allen < >> > > > > > >> [email protected]> >> > > > > > >> > > > wrote: >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> I like the "Index Filtering" option based on the >> > > > > > flexibility >> > > > > > >> > that >> > > > > > >> > > it >> > > > > > >> > > > > > >> provides. Should each output (HDFS, ES, etc) have >> > its >> > > > own >> > > > > > >> > > > > configuration >> > > > > > >> > > > > > >> settings? For example, aren't things like >> batching >> > > > > handled >> > > > > > >> > > > separately >> > > > > > >> > > > > > for >> > > > > > >> > > > > > >> HDFS versus Elasticsearch? >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> Something along the lines of... >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> { >> > > > > > >> > > > > > >> "hdfs" : { >> > > > > > >> > > > > > >> "when": "exists(field1)", >> > > > > > >> > > > > > >> "batchSize": 100 >> > > > > > >> > > > > > >> }, >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> "elasticsearch" : { >> > > > > > >> > > > > > >> "when": "true", >> > > > > > >> > > > > > >> "batchSize": 1000, >> > > > > > >> > > > > > >> "index": "squid" >> > > > > > >> > > > > > >> } >> > > > > > >> > > > > > >> } >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> On Fri, Jan 13, 2017 at 9:10 AM, Casey Stella < >> > > > > > >> > [email protected] >> > > > > > >> > > > >> > > > > > >> > > > > > wrote: >> > > > > > >> > > > > > >> >> > > > > > >> > > > > > >> > Yeah, I tend to like the first option too. Any >> > > > > opposition >> > > > > > >> to >> > > > > > >> > > that >> > > > > > >> > > > > > from >> > > > > > >> > > > > > >> > anyone? >> > > > > > >> > > > > > >> > >> > > > > > >> > > > > > >> > The points brought up are good ones and I think >> > > that >> > > > it >> > > > > > >> may be >> > > > > > >> > > > > worth a >> > > > > > >> > > > > > >> > broader discussion of the requirements of >> > indexing >> > > > in a >> > > > > > >> > separate >> > > > > > >> > > > dev >> > > > > > >> > > > > > list >> > > > > > >> > > > > > >> > thread. Maybe a list of desires with coherent >> > > > use-cases >> > > > > > >> > > > justifying >> > > > > > >> > > > > > them >> > > > > > >> > > > > > >> so >> > > > > > >> > > > > > >> > we can think about how this stuff should work >> and >> > > > where >> > > > > > the >> > > > > > >> > > > natural >> > > > > > >> > > > > > >> > extension points should be. Afterall, we need >> to >> > > toe >> > > > > the >> > > > > > >> line >> > > > > > >> > > > > between >> > > > > > >> > > > > > >> > engineering and overengineering for features >> > nobody >> > > > > will >> > > > > > >> want. >> > > > > > >> > > > > > >> > >> > > > > > >> > > > > > >> > I'm not sure about the extensions to the >> standard >> > > > > fields. >> > > > > > >> I'm >> > > > > > >> > > > torn >> > > > > > >> > > > > > >> between >> > > > > > >> > > > > > >> > the notions that we should have no standard >> > fields >> > > vs >> > > > > we >> > > > > > >> > should >> > > > > > >> > > > > have a >> > > > > > >> > > > > > >> > boatload of standard fields (with most of them >> > > > empty). >> > > > > I >> > > > > > >> > > exchange >> > > > > > >> > > > > > >> > positions fairly regularly on that question. ;) >> > It >> > > > may >> > > > > be >> > > > > > >> > > worth a >> > > > > > >> > > > > dev >> > > > > > >> > > > > > >> list >> > > > > > >> > > > > > >> > discussion to lay out how you imagine an >> > extension >> > > of >> > > > > > >> standard >> > > > > > >> > > > > fields >> > > > > > >> > > > > > and >> > > > > > >> > > > > > >> > how it might look as implemented in Metron. >> > > > > > >> > > > > > >> > >> > > > > > >> > > > > > >> > Casey >> > > > > > >> > > > > > >> > >> > > > > > >> > > > > > >> > Casey >> > > > > > >> > > > > > >> > >> > > > > > >> > > > > > >> > On Thu, Jan 12, 2017 at 9:58 PM, Kyle >> Richardson >> > < >> > > > > > >> > > > > > >> > [email protected]> >> > > > > > >> > > > > > >> > wrote: >> > > > > > >> > > > > > >> > >> > > > > > >> > > > > > >> > > I'll second my preference for the first >> > option. I >> > > > > think >> > > > > > >> the >> > > > > > >> > > > > ability >> > > > > > >> > > > > > to >> > > > > > >> > > > > > >> > use >> > > > > > >> > > > > > >> > > Stellar filters to customize indexing would >> be >> > a >> > > > big >> > > > > > win. >> > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > I'm glad Matt brought up the point about data >> > > lake >> > > > > and >> > > > > > >> CEP. >> > > > > > >> > I >> > > > > > >> > > > > think >> > > > > > >> > > > > > >> this >> > > > > > >> > > > > > >> > is >> > > > > > >> > > > > > >> > > a really important use case that we need to >> > > > consider. >> > > > > > >> Take a >> > > > > > >> > > > > simple >> > > > > > >> > > > > > >> > > example... If I have data coming in from 3 >> > > > different >> > > > > > >> > firewall >> > > > > > >> > > > > > vendors >> > > > > > >> > > > > > >> > and 2 >> > > > > > >> > > > > > >> > > different web proxy/url filtering vendors >> and I >> > > > want >> > > > > to >> > > > > > >> be >> > > > > > >> > > able >> > > > > > >> > > > to >> > > > > > >> > > > > > >> > analyze >> > > > > > >> > > > > > >> > > that data set, I need the data to be indexed >> > all >> > > > > > together >> > > > > > >> > > > (likely >> > > > > > >> > > > > in >> > > > > > >> > > > > > >> > HDFS) >> > > > > > >> > > > > > >> > > and to have a normalized schema such that IP >> > > > address, >> > > > > > >> URL, >> > > > > > >> > and >> > > > > > >> > > > > user >> > > > > > >> > > > > > >> name >> > > > > > >> > > > > > >> > > (to take a few) can be easily queried and >> > > > > aggregated. I >> > > > > > >> can >> > > > > > >> > > also >> > > > > > >> > > > > > >> envision >> > > > > > >> > > > > > >> > > scenarios where I would want to index data >> > based >> > > on >> > > > > > >> > attributes >> > > > > > >> > > > > other >> > > > > > >> > > > > > >> than >> > > > > > >> > > > > > >> > > sensor, business unit or subsidiary for >> > example. >> > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > I've been wanted to propose extending our 7 >> > > > standard >> > > > > > >> fields >> > > > > > >> > to >> > > > > > >> > > > > > include >> > > > > > >> > > > > > >> > > things like URL and user. Is there community >> > > > > > >> > interest/support >> > > > > > >> > > > for >> > > > > > >> > > > > > >> moving >> > > > > > >> > > > > > >> > in >> > > > > > >> > > > > > >> > > that direction? If so, I'll start a new >> thread. >> > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > Thanks! >> > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > -Kyle >> > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > On Thu, Jan 12, 2017 at 6:51 PM, Matt Foley < >> > > > > > >> > [email protected] >> > > > > > >> > > > >> > > > > > >> > > > > > wrote: >> > > > > > >> > > > > > >> > > >> > > > > > >> > > > > > >> > > > Ah, I see. If overriding the default index >> > name >> > > > > > allows >> > > > > > >> > > using >> > > > > > >> > > > > the >> > > > > > >> > > > > > >> same >> > > > > > >> > > > > > >> > > > name for multiple sensors, then the goal >> can >> > be >> > > > > > >> achieved. >> > > > > > >> > > > > > >> > > > Thanks, >> > > > > > >> > > > > > >> > > > --Matt >> > > > > > >> > > > > > >> > > > >> > > > > > >> > > > > > >> > > > >> > > > > > >> > > > > > >> > > > On 1/12/17, 3:30 PM, "Casey Stella" < >> > > > > > >> [email protected]> >> > > > > > >> > > > wrote: >> > > > > > >> > > > > > >> > > > >> > > > > > >> > > > > > >> > > > Oh, you could! Let's say you have a syslog >> > > parser >> > > > > > >> > with >> > > > > > >> > > > data >> > > > > > >> > > > > > from >> > > > > > >> > > > > > >> > > > sources 1 >> > > > > > >> > > > > > >> > > > 2 and 3. You'd end up with one kafka queue >> > > with 3 >> > > > > > >> > > parsers >> > > > > > >> > > > > > >> attached >> > > > > > >> > > > > > >> > > to >> > > > > > >> > > > > > >> > > > that >> > > > > > >> > > > > > >> > > > queue, each picking part the messages from >> > > source >> > > > > > >> 1, 2 >> > > > > > >> > > and >> > > > > > >> > > > > 3. >> > > > > > >> > > > > > >> > They'd >> > > > > > >> > > > > > >> > > > go >> > > > > > >> > > > > > >> > > > through separate enrichment and into the >> > > indexing >> > > > > > >> > > > topology. >> > > > > > >> > > > > > In >> > > > > > >> > > > > > >> the >> > > > > > >> > > > > > >> > > > indexing topology, you could specify the >> same >> > > > index >> > > > > > >> > name >> > > > > > >> > > > > > "syslog" >> > > > > > >> > > > > > >> > and >> > > > > > >> > > > > > >> > > > all >> > > > > > >> > > > > > >> > > > of the messages go into the same index for >> > CEP >> > > > > > >> > querying >> > > > > > >> > > if >> > > > > > >> > > > > so >> > > > > > >> > > > > > >> > > desired. >> > > > > > >> > > > > > >> > > > >> > > > > > >> > > > > > >> > > > On Thu, Jan 12, 2017 at 6:27 PM, Matt >> Foley < >> > > > > > >> > > > > [email protected] >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > wrote: >> > > > > > >> > > > > > >> > > > >> > > > > > >> > > > > > >> > > > > Syslog is hell on parsers – I know, I >> > worked >> > > at >> > > > > > >> > > LogLogic >> > > > > > >> > > > > in >> > > > > > >> > > > > > a >> > > > > > >> > > > > > >> > > > previous >> > > > > > >> > > > > > >> > > > > life. It makes perfect sense to route >> > > different >> > > > > > >> > lines >> > > > > > >> > > > > from >> > > > > > >> > > > > > >> > syslog >> > > > > > >> > > > > > >> > > > through >> > > > > > >> > > > > > >> > > > > different appropriate parsers. But a lot >> of >> > > > what >> > > > > > >> > the >> > > > > > >> > > > > > parsers >> > > > > > >> > > > > > >> do >> > > > > > >> > > > > > >> > is >> > > > > > >> > > > > > >> > > > > identify consistent subsets of metadata >> and >> > > > > > >> annotate >> > > > > > >> > > it >> > > > > > >> > > > – >> > > > > > >> > > > > > eg, >> > > > > > >> > > > > > >> > > > src_ip_addr, >> > > > > > >> > > > > > >> > > > > event timestamps, etc. Once those >> metadata >> > > are >> > > > > > >> > > > annotated >> > > > > > >> > > > > > and >> > > > > > >> > > > > > >> > > > available >> > > > > > >> > > > > > >> > > > > with common field names, why doesn’t it >> > make >> > > > > > >> sense >> > > > > > >> > to >> > > > > > >> > > > > index >> > > > > > >> > > > > > the >> > > > > > >> > > > > > >> > > > messages >> > > > > > >> > > > > > >> > > > > together, for CEP querying? I think >> Splunk >> > > has >> > > > > > >> > > > > illustrated >> > > > > > >> > > > > > >> this >> > > > > > >> > > > > > >> > > > model. >> > > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > > >> > > > > On 1/12/17, 3:00 PM, "Casey Stella" < >> > > > > > >> > > [email protected] >> > > > > > >> > > > > >> > > > > > >> > > > > > >> wrote: >> > > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > > >> > > > > yeah, I mean, honestly, I think the >> > approach >> > > > > > >> > that >> > > > > > >> > > > > we've >> > > > > > >> > > > > > >> taken >> > > > > > >> > > > > > >> > > for >> > > > > > >> > > > > > >> > > > > sources >> > > > > > >> > > > > > >> > > > > which aggregate different types of data >> is >> > to >> > > > > > >> > > > provide >> > > > > > >> > > > > > >> filters >> > > > > > >> > > > > > >> > > at >> > > > > > >> > > > > > >> > > > the >> > > > > > >> > > > > > >> > > > > parser >> > > > > > >> > > > > > >> > > > > level and have multiple parser topologies >> > > > > > >> (with >> > > > > > >> > > > > > different, >> > > > > > >> > > > > > >> > > > possibly >> > > > > > >> > > > > > >> > > > > mutually exclusive filters) running. This >> > > > > > >> would >> > > > > > >> > > be >> > > > > > >> > > > a >> > > > > > >> > > > > > >> > > completely >> > > > > > >> > > > > > >> > > > > separate >> > > > > > >> > > > > > >> > > > > sensor. Imagine a syslog data source that >> > > > > > >> > > > aggregates >> > > > > > >> > > > > > and >> > > > > > >> > > > > > >> you >> > > > > > >> > > > > > >> > > > want to >> > > > > > >> > > > > > >> > > > > pick >> > > > > > >> > > > > > >> > > > > apart certain pieces of messages. This is >> > > > > > >> why >> > > > > > >> > the >> > > > > > >> > > > > > initial >> > > > > > >> > > > > > >> > > > thought and >> > > > > > >> > > > > > >> > > > > architecture was one index per sensor. >> > > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > > >> > > > > On Thu, Jan 12, 2017 at 5:55 PM, Matt >> > Foley < >> > > > > > >> > > > > > >> > [email protected]> >> > > > > > >> > > > > > >> > > > wrote: >> > > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > > >> > > > > > I’m thinking that CEP (Complex Event >> > > > > > >> > Processing) >> > > > > > >> > > > is >> > > > > > >> > > > > > >> > contrary >> > > > > > >> > > > > > >> > > > to the >> > > > > > >> > > > > > >> > > > > idea >> > > > > > >> > > > > > >> > > > > > of silo-ing data per sensor. >> > > > > > >> > > > > > >> > > > > > Now it’s true that some of those >> sensors >> > > > > > >> are >> > > > > > >> > > > already >> > > > > > >> > > > > > >> > > > aggregating >> > > > > > >> > > > > > >> > > > > data from >> > > > > > >> > > > > > >> > > > > > multiple sources, so maybe I’m wrong >> > here. >> > > > > > >> > > > > > >> > > > > > But it just seems to me that the “data >> > > > > > >> lake” >> > > > > > >> > > > > insights >> > > > > > >> > > > > > >> come >> > > > > > >> > > > > > >> > > from >> > > > > > >> > > > > > >> > > > > being able >> > > > > > >> > > > > > >> > > > > > to make decisions over the whole mass >> of >> > > > > > >> data >> > > > > > >> > > > rather >> > > > > > >> > > > > > than >> > > > > > >> > > > > > >> > > just >> > > > > > >> > > > > > >> > > > > vertical >> > > > > > >> > > > > > >> > > > > > slices of it. >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > On 1/12/17, 2:15 PM, "Casey Stella" < >> > > > > > >> > > > > > [email protected]> >> > > > > > >> > > > > > >> > > > wrote: >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > Hey Matt, >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > Thanks for the comment! >> > > > > > >> > > > > > >> > > > > > 1. At the moment, we only have one >> > > > > > >> index >> > > > > > >> > > name, >> > > > > > >> > > > > the >> > > > > > >> > > > > > >> > > default >> > > > > > >> > > > > > >> > > > of >> > > > > > >> > > > > > >> > > > > which is >> > > > > > >> > > > > > >> > > > > > the >> > > > > > >> > > > > > >> > > > > > sensor name but that's entirely up to >> > > > > > >> the >> > > > > > >> > > > user. >> > > > > > >> > > > > > This >> > > > > > >> > > > > > >> > is >> > > > > > >> > > > > > >> > > > sensor >> > > > > > >> > > > > > >> > > > > > specific, >> > > > > > >> > > > > > >> > > > > > so it'd be a separate config for each >> > > > > > >> > > sensor. >> > > > > > >> > > > > If >> > > > > > >> > > > > > we >> > > > > > >> > > > > > >> > want >> > > > > > >> > > > > > >> > > > to >> > > > > > >> > > > > > >> > > > > build >> > > > > > >> > > > > > >> > > > > > multiple >> > > > > > >> > > > > > >> > > > > > indices per sensor, we'd have to think >> > > > > > >> > > > carefully >> > > > > > >> > > > > > >> about >> > > > > > >> > > > > > >> > > how >> > > > > > >> > > > > > >> > > > to do >> > > > > > >> > > > > > >> > > > > that >> > > > > > >> > > > > > >> > > > > > and >> > > > > > >> > > > > > >> > > > > > would be a bigger undertaking. I >> > > > > > >> guess I >> > > > > > >> > > can >> > > > > > >> > > > > see >> > > > > > >> > > > > > the >> > > > > > >> > > > > > >> > > use, >> > > > > > >> > > > > > >> > > > though >> > > > > > >> > > > > > >> > > > > > (redirect >> > > > > > >> > > > > > >> > > > > > messages to one index vs another based >> > > > > > >> on >> > > > > > >> > a >> > > > > > >> > > > > > predicate >> > > > > > >> > > > > > >> > for >> > > > > > >> > > > > > >> > > > a given >> > > > > > >> > > > > > >> > > > > > sensor). >> > > > > > >> > > > > > >> > > > > > Anyway, not where I was originally >> > > > > > >> > thinking >> > > > > > >> > > > that >> > > > > > >> > > > > > this >> > > > > > >> > > > > > >> > > > discussion >> > > > > > >> > > > > > >> > > > > would >> > > > > > >> > > > > > >> > > > > > go, >> > > > > > >> > > > > > >> > > > > > but it's an interesting point. >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > 2. I hadn't thought through the >> > > > > > >> > > implementation >> > > > > > >> > > > > > quite >> > > > > > >> > > > > > >> > yet, >> > > > > > >> > > > > > >> > > > but we >> > > > > > >> > > > > > >> > > > > don't >> > > > > > >> > > > > > >> > > > > > actually have a splitter bolt in that >> > > > > > >> > > > topology, >> > > > > > >> > > > > > just >> > > > > > >> > > > > > >> a >> > > > > > >> > > > > > >> > > > spout >> > > > > > >> > > > > > >> > > > > that goes >> > > > > > >> > > > > > >> > > > > > to >> > > > > > >> > > > > > >> > > > > > the elasticsearch writer and also to >> > > > > > >> the >> > > > > > >> > > hdfs >> > > > > > >> > > > > > writer. >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Thu, Jan 12, 2017 at 4:52 PM, Matt >> > > > > > >> > Foley >> > > > > > >> > > < >> > > > > > >> > > > > > >> > > > [email protected]> >> > > > > > >> > > > > > >> > > > > wrote: >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > > Casey, good to have controls like >> > > > > > >> this. >> > > > > > >> > > > > Couple >> > > > > > >> > > > > > >> > > > questions: >> > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > > 1. Regarding the “index” : “squid” >> > > > > > >> > > > name/value >> > > > > > >> > > > > > pair, >> > > > > > >> > > > > > >> > is >> > > > > > >> > > > > > >> > > > the >> > > > > > >> > > > > > >> > > > > index name >> > > > > > >> > > > > > >> > > > > > > expected to always be a sensor >> > > > > > >> name? Or >> > > > > > >> > > is >> > > > > > >> > > > > the >> > > > > > >> > > > > > >> given >> > > > > > >> > > > > > >> > > > json >> > > > > > >> > > > > > >> > > > > structure >> > > > > > >> > > > > > >> > > > > > > subordinate to a sensor name in >> > > > > > >> > zookeeper? >> > > > > > >> > > > Or >> > > > > > >> > > > > > can >> > > > > > >> > > > > > >> we >> > > > > > >> > > > > > >> > > > build >> > > > > > >> > > > > > >> > > > > arbitrary >> > > > > > >> > > > > > >> > > > > > > indexes with this new specification, >> > > > > > >> > > > > > independent of >> > > > > > >> > > > > > >> > > > sensor? >> > > > > > >> > > > > > >> > > > > Should >> > > > > > >> > > > > > >> > > > > > there >> > > > > > >> > > > > > >> > > > > > > actually be a list of “indexes”, ie >> > > > > > >> > > > > > >> > > > > > > { “indexes” : [ >> > > > > > >> > > > > > >> > > > > > > {“index” : “name1”, >> > > > > > >> > > > > > >> > > > > > > … >> > > > > > >> > > > > > >> > > > > > > }, >> > > > > > >> > > > > > >> > > > > > > {“index” : “name2”, >> > > > > > >> > > > > > >> > > > > > > … >> > > > > > >> > > > > > >> > > > > > > } ] >> > > > > > >> > > > > > >> > > > > > > } >> > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > > 2. Would the filtering / writer >> > > > > > >> > selection >> > > > > > >> > > > > logic >> > > > > > >> > > > > > >> take >> > > > > > >> > > > > > >> > > > place in >> > > > > > >> > > > > > >> > > > > the >> > > > > > >> > > > > > >> > > > > > indexing >> > > > > > >> > > > > > >> > > > > > > topology splitter bolt? Seems like >> > > > > > >> that >> > > > > > >> > > > would >> > > > > > >> > > > > > have >> > > > > > >> > > > > > >> > the >> > > > > > >> > > > > > >> > > > > smallest >> > > > > > >> > > > > > >> > > > > > impact on >> > > > > > >> > > > > > >> > > > > > > current implementation, no? >> > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > > Sorry if these are already answered >> > > > > > >> in >> > > > > > >> > > > > PR-415, I >> > > > > > >> > > > > > >> > > haven’t >> > > > > > >> > > > > > >> > > > had >> > > > > > >> > > > > > >> > > > > time to >> > > > > > >> > > > > > >> > > > > > > review that one yet. >> > > > > > >> > > > > > >> > > > > > > Thanks, >> > > > > > >> > > > > > >> > > > > > > --Matt >> > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > > On 1/12/17, 12:55 PM, "Michael >> > > > > > >> > Miklavcic" >> > > > > > >> > > < >> > > > > > >> > > > > > >> > > > > > [email protected]> >> > > > > > >> > > > > > >> > > > > > > wrote: >> > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > > I like the flexibility and >> > > > > > >> > > > expressibility >> > > > > > >> > > > > of >> > > > > > >> > > > > > >> the >> > > > > > >> > > > > > >> > > > first >> > > > > > >> > > > > > >> > > > > option >> > > > > > >> > > > > > >> > > > > > with >> > > > > > >> > > > > > >> > > > > > > Stellar >> > > > > > >> > > > > > >> > > > > > > filters. >> > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > > M >> > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > > On Thu, Jan 12, 2017 at 1:51 PM, >> > > > > > >> > Casey >> > > > > > >> > > > > > Stella < >> > > > > > >> > > > > > >> > > > > > [email protected]> >> > > > > > >> > > > > > >> > > > > > > wrote: >> > > > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > > > As of METRON-652 < >> > > > > > >> > > > > > https://github.com/apache/ >> > > > > > >> > > > > > >> > > > > > > incubator-metron/pull/415>, we >> > > > > > >> > > > > > >> > > > > > > > will have decoupled the >> > > > > > >> indexing >> > > > > > >> > > > > > >> configuration >> > > > > > >> > > > > > >> > > > from the >> > > > > > >> > > > > > >> > > > > > enrichment >> > > > > > >> > > > > > >> > > > > > > > configuration. As an immediate >> > > > > > >> > > > > follow-up >> > > > > > >> > > > > > to >> > > > > > >> > > > > > >> > > that, >> > > > > > >> > > > > > >> > > > I'd >> > > > > > >> > > > > > >> > > > > like to >> > > > > > >> > > > > > >> > > > > > > provide the >> > > > > > >> > > > > > >> > > > > > > > ability to turn off and on >> > > > > > >> writers >> > > > > > >> > > via >> > > > > > >> > > > > the >> > > > > > >> > > > > > >> > > > configs. I'd >> > > > > > >> > > > > > >> > > > > like >> > > > > > >> > > > > > >> > > > > > to get >> > > > > > >> > > > > > >> > > > > > > some >> > > > > > >> > > > > > >> > > > > > > > community feedback on how the >> > > > > > >> > > > > > functionality >> > > > > > >> > > > > > >> > > should >> > > > > > >> > > > > > >> > > > work, >> > > > > > >> > > > > > >> > > > > if >> > > > > > >> > > > > > >> > > > > > y'all are >> > > > > > >> > > > > > >> > > > > > > > amenable. :) >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > As of now, we have 3 possible >> > > > > > >> > > writers >> > > > > > >> > > > > > which >> > > > > > >> > > > > > >> can >> > > > > > >> > > > > > >> > > be >> > > > > > >> > > > > > >> > > > used >> > > > > > >> > > > > > >> > > > > in the >> > > > > > >> > > > > > >> > > > > > > indexing >> > > > > > >> > > > > > >> > > > > > > > topology: >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > - Solr >> > > > > > >> > > > > > >> > > > > > > > - Elasticsearch >> > > > > > >> > > > > > >> > > > > > > > - HDFS >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > HDFS is always used, >> > > > > > >> elasticsearch >> > > > > > >> > > or >> > > > > > >> > > > > > solr is >> > > > > > >> > > > > > >> > > used >> > > > > > >> > > > > > >> > > > > depending >> > > > > > >> > > > > > >> > > > > > on how >> > > > > > >> > > > > > >> > > > > > > you >> > > > > > >> > > > > > >> > > > > > > > start the indexing topology. >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > A couple of proposals come to >> > > > > > >> mind >> > > > > > >> > > > > > >> immediately: >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > *Index Filtering* >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > You would be able to specify a >> > > > > > >> > > filter >> > > > > > >> > > > as >> > > > > > >> > > > > > >> > defined >> > > > > > >> > > > > > >> > > > by a >> > > > > > >> > > > > > >> > > > > stellar >> > > > > > >> > > > > > >> > > > > > > statement >> > > > > > >> > > > > > >> > > > > > > > (likely a reuse of the >> > > > > > >> > StellarFilter >> > > > > > >> > > > > that >> > > > > > >> > > > > > >> > exists >> > > > > > >> > > > > > >> > > > in the >> > > > > > >> > > > > > >> > > > > > Parsers) >> > > > > > >> > > > > > >> > > > > > > which >> > > > > > >> > > > > > >> > > > > > > > would allow you to indicate on >> > > > > > >> a >> > > > > > >> > > > > > >> > > > message-by-message basis >> > > > > > >> > > > > > >> > > > > > whether or >> > > > > > >> > > > > > >> > > > > > > not to >> > > > > > >> > > > > > >> > > > > > > > write the message. >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > The semantics of this would be >> > > > > > >> as >> > > > > > >> > > > > follows: >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > - Default (i.e. >> > > > > > >> unspecified) is >> > > > > > >> > > to >> > > > > > >> > > > > pass >> > > > > > >> > > > > > >> > > > everything >> > > > > > >> > > > > > >> > > > > through >> > > > > > >> > > > > > >> > > > > > (hence >> > > > > > >> > > > > > >> > > > > > > > backwards compatible with >> > > > > > >> the >> > > > > > >> > > > current >> > > > > > >> > > > > > >> > default >> > > > > > >> > > > > > >> > > > config). >> > > > > > >> > > > > > >> > > > > > > > - Messages which have the >> > > > > > >> > > > associated >> > > > > > >> > > > > > >> stellar >> > > > > > >> > > > > > >> > > > statement >> > > > > > >> > > > > > >> > > > > > evaluate >> > > > > > >> > > > > > >> > > > > > > to true >> > > > > > >> > > > > > >> > > > > > > > for the writer type will be >> > > > > > >> > > > written, >> > > > > > >> > > > > > >> > otherwise >> > > > > > >> > > > > > >> > > > not. >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > Sample indexing config which >> > > > > > >> would >> > > > > > >> > > > write >> > > > > > >> > > > > > out >> > > > > > >> > > > > > >> no >> > > > > > >> > > > > > >> > > > messages >> > > > > > >> > > > > > >> > > > > to >> > > > > > >> > > > > > >> > > > > > HDFS and >> > > > > > >> > > > > > >> > > > > > > write >> > > > > > >> > > > > > >> > > > > > > > out only messages containing a >> > > > > > >> > field >> > > > > > >> > > > > > called >> > > > > > >> > > > > > >> > > > "field1": >> > > > > > >> > > > > > >> > > > > > > > { >> > > > > > >> > > > > > >> > > > > > > > "index" : "squid" >> > > > > > >> > > > > > >> > > > > > > > ,"batchSize" : 100 >> > > > > > >> > > > > > >> > > > > > > > ,"filters" : { >> > > > > > >> > > > > > >> > > > > > > > "HDFS" : "false" >> > > > > > >> > > > > > >> > > > > > > > ,"ES" : "exists(field1)" >> > > > > > >> > > > > > >> > > > > > > > } >> > > > > > >> > > > > > >> > > > > > > > } >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > *Index On/Off Switch* >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > A simpler solution would be to >> > > > > > >> > just >> > > > > > >> > > > > > provide a >> > > > > > >> > > > > > >> > > list >> > > > > > >> > > > > > >> > > > of >> > > > > > >> > > > > > >> > > > > writers >> > > > > > >> > > > > > >> > > > > > to >> > > > > > >> > > > > > >> > > > > > > write >> > > > > > >> > > > > > >> > > > > > > > messages. The semantics would >> > > > > > >> be >> > > > > > >> > as >> > > > > > >> > > > > > follows: >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > - If the list is >> > > > > > >> unspecified, >> > > > > > >> > > then >> > > > > > >> > > > > the >> > > > > > >> > > > > > >> > default >> > > > > > >> > > > > > >> > > > is to >> > > > > > >> > > > > > >> > > > > write >> > > > > > >> > > > > > >> > > > > > all >> > > > > > >> > > > > > >> > > > > > > messages >> > > > > > >> > > > > > >> > > > > > > > for every writer in the >> > > > > > >> > indexing >> > > > > > >> > > > > > topology >> > > > > > >> > > > > > >> > > > > > > > - If the list is specified, >> > > > > > >> > then >> > > > > > >> > > a >> > > > > > >> > > > > > writer >> > > > > > >> > > > > > >> > will >> > > > > > >> > > > > > >> > > > write >> > > > > > >> > > > > > >> > > > > all >> > > > > > >> > > > > > >> > > > > > messages >> > > > > > >> > > > > > >> > > > > > > if and >> > > > > > >> > > > > > >> > > > > > > > only if it is named in the >> > > > > > >> > list. >> > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > > > > Sample indexing config which >> > > > > > >> turns >> > > > > > >> > > off >> > > > > > >> > > > > > HDFS >> > > > > > >> > > > > > >> and >> > > > > > >> > > > > > >> > > > keeps on >> > > > > > >> > > > > > >> > > > > > > Elasticsearch: >> > > > > > >> > > > > > >> > > > > > > > { >> > > > > > >> > > > > > >> > > > > > > > "index" : "squid" >> > > > > > >> > > > > > >> > > > > > > > ,"batchSize" : 100 >> > > > > > >> > > > > > >> > > > > > > > ,"writers" : [ "ES" ] >> > > >> > > -- >> > >> > Jon >> > >> > Sent from my mobile device >> > > > -- > Nick Allen <[email protected]> ------------------- Thank you, James Sirota PPMC- Apache Metron (Incubating) jsirota AT apache DOT org
