Re: [DISCUSS] Turning off indexing writers feature discussion

zeo...@gmail.com Fri, 13 Jan 2017 08:56:13 -0800

I think Simon has a very valid suggestion.  Additionally, I have a two
questions.  For the following config:


{
  "index" : "foo"
 ,"batchSize" : 100
}

Are now all logs going to the same index?  I read this as a writer-specific
override of the sensor-specific defaults to use an index name of foo* (in
HDFS that's foo, in ES that's foo-${timestamp}).  If that's true, would
something like this work?

{
 "batchSize" : 100
 , "writerConfig" :
   {
      "elasticsearch" : {
                                   "when" : "exists(field1)",
                                   "index" : "+foo"
                                 }
   }
}

How I read this is, set a default batchSize of 100, but for each index
(holding to the sensor-specific defaults), specify an override for
elasticsearch to send to the index foo when field1 exists.  The result in
my mind would be that the sensor-specific default and foo both get this log
line, if field1 exists.

Of course the syntax I used for "+foo" is probably optimal, but just
illustrative that it's appending an additional index to send to, as opposed
to overwriting the destination index (if you didn't add the +).  In fact,
the more I look at it, this appears to be a bad approach but I'm struggling
to think of an exact, cleaner solution to suggest offhand.  Something that
does if(exists(field1); index+=foo.

Also, as previously discussed, this could easily be a follow-on enhancement.

Jon

On Fri, Jan 13, 2017 at 11:18 AM David Lyle <dlyle65...@gmail.com> wrote:

Thanks Casey!

I think I had the right of it, but wanted to make sure.

I'm +1 on defaults in global with overrides in sensor-specific. At least in
the first iteration. I (like Otto) suspect we'll have a few go-arounds on
this.

-D...


On Fri, Jan 13, 2017 at 11:09 AM, Otto Fowler <ottobackwa...@gmail.com>
wrote:

> This is an excellent point
>
>
> On January 13, 2017 at 10:54:07, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> Some thing else to consider here is the possibility of multiple indices
> within a given target technology.
>
> For example, if I’m indexing data from a given sensor into, say solr, I
> may want it filtered differently into two different indices. This would
> enable me to create different ‘views’ which could have different security
> settings applied in that backend. This would be useful for multi-tenant
> installs, and for differing data privilege levels within an organisation.
> You could argue that this is more a concern for filtering of the results
> coming out of an index, but currently this is a lot harder than using
> something like the ranger solr authorisation plugin to control access at
an
> index by index granularity.
>
> Essentially, the indexer topology then becomes a filter and router, which
> argues for it being a separate step, before the process which actually
> writes out to each platform. It may also make sense to have a concept of a
> routing key built up by earlier enrichment to allow shuffle control in
> storm, rather than a full stellar statement for routing, to avoid
overhead.
>
> Simon
>
> > On 13 Jan 2017, at 07:44, Casey Stella <ceste...@gmail.com> wrote:
> >
> > I am suggesting that, yes. The configs are essentially the same as
yours,
> > except there is an override specified at the top level. Without that, in
> > order to specify both HDFS and ES have batch sizes of 100, you have to
> > explicitly configure each. It's less that I'm trying to have backwards
> > compatibility and more that I'm trying to make the majority case easy:
> both
> > writers write everything to a specified index name with a specified
batch
> > size (which is what we have now). Beyond that, I want to allow for
> > specifying an override for the config on a writer-by-writer basis for
> those
> > who need it.
> >
> > On Fri, Jan 13, 2017 at 10:39 AM, Nick Allen <n...@nickallen.org> wrote:
> >
> >> Are you saying we support all of these variants? I realize you are
> trying
> >> to have some backwards compatibility, but this also makes it harder for
> a
> >> user to grok (for me at least).
> >>
> >> Personally I like my original example as there are fewer
sub-structures,
> >> like 'writerConfig', which makes the whole thing simpler and easier to
> >> grok. But maybe others will think your proposal is just as easy to
grok.
> >>
> >>
> >>
> >> On Fri, Jan 13, 2017 at 10:01 AM, Casey Stella <ceste...@gmail.com>
> wrote:
> >>
> >>> Ok, so here's what I'm thinking based on the discussion:
> >>>
> >>> - Keeping the configs that we have now (batchSize and index) as
> >> defaults
> >>> for the unspecified writer-specific case
> >>> - Adding the config Nick suggested
> >>>
> >>> *Base Case*:
> >>> {
> >>> }
> >>>
> >>> - all writers write all messages
> >>> - index named the same as the sensor for all writers
> >>> - batchSize of 1 for all writers
> >>>
> >>> *Writer-non-specific case*:
> >>> {
> >>> "index" : "foo"
> >>> ,"batchSize" : 100
> >>> }
> >>>
> >>> - All writers write all messages
> >>> - index is named "foo", different from the sensor for all writers
> >>> - batchSize is 100 for all writers
> >>>
> >>> *Writer-specific case without filters*
> >>> {
> >>> "index" : "foo"
> >>> ,"batchSize" : 1
> >>> , "writerConfig" :
> >>> {
> >>> "elasticsearch" : {
> >>> "batchSize" : 100
> >>> }
> >>> }
> >>> }
> >>>
> >>> - All writers write all messages
> >>> - index is named "foo", different from the sensor for all writers
> >>> - batchSize is 1 for HDFS and 100 for elasticsearch writers
> >>> - NOTE: I could override the index name too
> >>>
> >>> *Writer-specific case with filters*
> >>> {
> >>> "index" : "foo"
> >>> ,"batchSize" : 1
> >>> , "writerConfig" :
> >>> {
> >>> "elasticsearch" : {
> >>> "batchSize" : 100,
> >>> "when" : "exists(field1)"
> >>> },
> >>> "hdfs" : {
> >>> "when" : "false"
> >>> }
> >>> }
> >>> }
> >>>
> >>> - ES writer writes messages which have field1, HDFS doesn't
> >>> - index is named "foo", different from the sensor for all writers
> >>> - 100 for elasticsearch writers
> >>>
> >>> Thoughts?
> >>>
> >>> On Fri, Jan 13, 2017 at 9:44 AM, Carolyn Duby <cd...@hortonworks.com>
> >>> wrote:
> >>>
> >>>> For larger installations you need to control what is indexed so you
> >> don’t
> >>>> end up with a nasty elastic search situation and so you can mine the
> >> data
> >>>> later for reports and training ml models.
> >>>>
> >>>> Thanks
> >>>> Carolyn
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 1/13/17, 9:40 AM, "Casey Stella" <ceste...@gmail.com> wrote:
> >>>>
> >>>>> OH that's a good idea!
> >>>>>
> >>>>> On Fri, Jan 13, 2017 at 9:39 AM, Nick Allen <n...@nickallen.org>
> >> wrote:
> >>>>>
> >>>>>> I like the "Index Filtering" option based on the flexibility that
it
> >>>>>> provides. Should each output (HDFS, ES, etc) have its own
> >>> configuration
> >>>>>> settings? For example, aren't things like batching handled
> >> separately
> >>>> for
> >>>>>> HDFS versus Elasticsearch?
> >>>>>>
> >>>>>> Something along the lines of...
> >>>>>>
> >>>>>> {
> >>>>>> "hdfs" : {
> >>>>>> "when": "exists(field1)",
> >>>>>> "batchSize": 100
> >>>>>> },
> >>>>>>
> >>>>>> "elasticsearch" : {
> >>>>>> "when": "true",
> >>>>>> "batchSize": 1000,
> >>>>>> "index": "squid"
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Jan 13, 2017 at 9:10 AM, Casey Stella <ceste...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> Yeah, I tend to like the first option too. Any opposition to that
> >>>> from
> >>>>>>> anyone?
> >>>>>>>
> >>>>>>> The points brought up are good ones and I think that it may be
> >>> worth a
> >>>>>>> broader discussion of the requirements of indexing in a separate
> >> dev
> >>>> list
> >>>>>>> thread. Maybe a list of desires with coherent use-cases
> >> justifying
> >>>> them
> >>>>>> so
> >>>>>>> we can think about how this stuff should work and where the
> >> natural
> >>>>>>> extension points should be. Afterall, we need to toe the line
> >>> between
> >>>>>>> engineering and overengineering for features nobody will want.
> >>>>>>>
> >>>>>>> I'm not sure about the extensions to the standard fields. I'm
> >> torn
> >>>>>> between
> >>>>>>> the notions that we should have no standard fields vs we should
> >>> have a
> >>>>>>> boatload of standard fields (with most of them empty). I exchange
> >>>>>>> positions fairly regularly on that question. ;) It may be worth a
> >>> dev
> >>>>>> list
> >>>>>>> discussion to lay out how you imagine an extension of standard
> >>> fields
> >>>> and
> >>>>>>> how it might look as implemented in Metron.
> >>>>>>>
> >>>>>>> Casey
> >>>>>>>
> >>>>>>> Casey
> >>>>>>>
> >>>>>>> On Thu, Jan 12, 2017 at 9:58 PM, Kyle Richardson <
> >>>>>>> kylerichards...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I'll second my preference for the first option. I think the
> >>> ability
> >>>> to
> >>>>>>> use
> >>>>>>>> Stellar filters to customize indexing would be a big win.
> >>>>>>>>
> >>>>>>>> I'm glad Matt brought up the point about data lake and CEP. I
> >>> think
> >>>>>> this
> >>>>>>> is
> >>>>>>>> a really important use case that we need to consider. Take a
> >>> simple
> >>>>>>>> example... If I have data coming in from 3 different firewall
> >>>> vendors
> >>>>>>> and 2
> >>>>>>>> different web proxy/url filtering vendors and I want to be able
> >> to
> >>>>>>> analyze
> >>>>>>>> that data set, I need the data to be indexed all together
> >> (likely
> >>> in
> >>>>>>> HDFS)
> >>>>>>>> and to have a normalized schema such that IP address, URL, and
> >>> user
> >>>>>> name
> >>>>>>>> (to take a few) can be easily queried and aggregated. I can also
> >>>>>> envision
> >>>>>>>> scenarios where I would want to index data based on attributes
> >>> other
> >>>>>> than
> >>>>>>>> sensor, business unit or subsidiary for example.
> >>>>>>>>
> >>>>>>>> I've been wanted to propose extending our 7 standard fields to
> >>>> include
> >>>>>>>> things like URL and user. Is there community interest/support
> >> for
> >>>>>> moving
> >>>>>>> in
> >>>>>>>> that direction? If so, I'll start a new thread.
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> -Kyle
> >>>>>>>>
> >>>>>>>> On Thu, Jan 12, 2017 at 6:51 PM, Matt Foley <ma...@apache.org>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Ah, I see. If overriding the default index name allows using
> >>> the
> >>>>>> same
> >>>>>>>>> name for multiple sensors, then the goal can be achieved.
> >>>>>>>>> Thanks,
> >>>>>>>>> --Matt
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 1/12/17, 3:30 PM, "Casey Stella" <ceste...@gmail.com>
> >> wrote:
> >>>>>>>>>
> >>>>>>>>> Oh, you could! Let's say you have a syslog parser with
> >> data
> >>>> from
> >>>>>>>>> sources 1
> >>>>>>>>> 2 and 3. You'd end up with one kafka queue with 3 parsers
> >>>>>> attached
> >>>>>>>> to
> >>>>>>>>> that
> >>>>>>>>> queue, each picking part the messages from source 1, 2 and
> >>> 3.
> >>>>>>> They'd
> >>>>>>>>> go
> >>>>>>>>> through separate enrichment and into the indexing
> >> topology.
> >>>> In
> >>>>>> the
> >>>>>>>>> indexing topology, you could specify the same index name
> >>>> "syslog"
> >>>>>>> and
> >>>>>>>>> all
> >>>>>>>>> of the messages go into the same index for CEP querying if
> >>> so
> >>>>>>>> desired.
> >>>>>>>>>
> >>>>>>>>> On Thu, Jan 12, 2017 at 6:27 PM, Matt Foley <
> >>> ma...@apache.org
> >>>>>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Syslog is hell on parsers – I know, I worked at LogLogic
> >>> in
> >>>> a
> >>>>>>>>> previous
> >>>>>>>>>> life. It makes perfect sense to route different lines
> >>> from
> >>>>>>> syslog
> >>>>>>>>> through
> >>>>>>>>>> different appropriate parsers. But a lot of what the
> >>>> parsers
> >>>>>> do
> >>>>>>> is
> >>>>>>>>>> identify consistent subsets of metadata and annotate it
> >> –
> >>>> eg,
> >>>>>>>>> src_ip_addr,
> >>>>>>>>>> event timestamps, etc. Once those metadata are
> >> annotated
> >>>> and
> >>>>>>>>> available
> >>>>>>>>>> with common field names, why doesn’t it make sense to
> >>> index
> >>>> the
> >>>>>>>>> messages
> >>>>>>>>>> together, for CEP querying? I think Splunk has
> >>> illustrated
> >>>>>> this
> >>>>>>>>> model.
> >>>>>>>>>>
> >>>>>>>>>> On 1/12/17, 3:00 PM, "Casey Stella" <ceste...@gmail.com
> >>>
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> yeah, I mean, honestly, I think the approach that
> >>> we've
> >>>>>> taken
> >>>>>>>> for
> >>>>>>>>>> sources
> >>>>>>>>>> which aggregate different types of data is to
> >> provide
> >>>>>> filters
> >>>>>>>> at
> >>>>>>>>> the
> >>>>>>>>>> parser
> >>>>>>>>>> level and have multiple parser topologies (with
> >>>> different,
> >>>>>>>>> possibly
> >>>>>>>>>> mutually exclusive filters) running. This would be
> >> a
> >>>>>>>> completely
> >>>>>>>>>> separate
> >>>>>>>>>> sensor. Imagine a syslog data source that
> >> aggregates
> >>>> and
> >>>>>> you
> >>>>>>>>> want to
> >>>>>>>>>> pick
> >>>>>>>>>> apart certain pieces of messages. This is why the
> >>>> initial
> >>>>>>>>> thought and
> >>>>>>>>>> architecture was one index per sensor.
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jan 12, 2017 at 5:55 PM, Matt Foley <
> >>>>>>> ma...@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I’m thinking that CEP (Complex Event Processing)
> >> is
> >>>>>>> contrary
> >>>>>>>>> to the
> >>>>>>>>>> idea
> >>>>>>>>>>> of silo-ing data per sensor.
> >>>>>>>>>>> Now it’s true that some of those sensors are
> >> already
> >>>>>>>>> aggregating
> >>>>>>>>>> data from
> >>>>>>>>>>> multiple sources, so maybe I’m wrong here.
> >>>>>>>>>>> But it just seems to me that the “data lake”
> >>> insights
> >>>>>> come
> >>>>>>>> from
> >>>>>>>>>> being able
> >>>>>>>>>>> to make decisions over the whole mass of data
> >> rather
> >>>> than
> >>>>>>>> just
> >>>>>>>>>> vertical
> >>>>>>>>>>> slices of it.
> >>>>>>>>>>>
> >>>>>>>>>>> On 1/12/17, 2:15 PM, "Casey Stella" <
> >>>> ceste...@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hey Matt,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for the comment!
> >>>>>>>>>>> 1. At the moment, we only have one index name,
> >>> the
> >>>>>>>> default
> >>>>>>>>> of
> >>>>>>>>>> which is
> >>>>>>>>>>> the
> >>>>>>>>>>> sensor name but that's entirely up to the
> >> user.
> >>>> This
> >>>>>>> is
> >>>>>>>>> sensor
> >>>>>>>>>>> specific,
> >>>>>>>>>>> so it'd be a separate config for each sensor.
> >>> If
> >>>> we
> >>>>>>> want
> >>>>>>>>> to
> >>>>>>>>>> build
> >>>>>>>>>>> multiple
> >>>>>>>>>>> indices per sensor, we'd have to think
> >> carefully
> >>>>>> about
> >>>>>>>> how
> >>>>>>>>> to do
> >>>>>>>>>> that
> >>>>>>>>>>> and
> >>>>>>>>>>> would be a bigger undertaking. I guess I can
> >>> see
> >>>> the
> >>>>>>>> use,
> >>>>>>>>> though
> >>>>>>>>>>> (redirect
> >>>>>>>>>>> messages to one index vs another based on a
> >>>> predicate
> >>>>>>> for
> >>>>>>>>> a given
> >>>>>>>>>>> sensor).
> >>>>>>>>>>> Anyway, not where I was originally thinking
> >> that
> >>>> this
> >>>>>>>>> discussion
> >>>>>>>>>> would
> >>>>>>>>>>> go,
> >>>>>>>>>>> but it's an interesting point.
> >>>>>>>>>>>
> >>>>>>>>>>> 2. I hadn't thought through the implementation
> >>>> quite
> >>>>>>> yet,
> >>>>>>>>> but we
> >>>>>>>>>> don't
> >>>>>>>>>>> actually have a splitter bolt in that
> >> topology,
> >>>> just
> >>>>>> a
> >>>>>>>>> spout
> >>>>>>>>>> that goes
> >>>>>>>>>>> to
> >>>>>>>>>>> the elasticsearch writer and also to the hdfs
> >>>> writer.
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Jan 12, 2017 at 4:52 PM, Matt Foley <
> >>>>>>>>> ma...@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Casey, good to have controls like this.
> >>> Couple
> >>>>>>>>> questions:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. Regarding the “index” : “squid”
> >> name/value
> >>>> pair,
> >>>>>>> is
> >>>>>>>>> the
> >>>>>>>>>> index name
> >>>>>>>>>>>> expected to always be a sensor name? Or is
> >>> the
> >>>>>> given
> >>>>>>>>> json
> >>>>>>>>>> structure
> >>>>>>>>>>>> subordinate to a sensor name in zookeeper?
> >> Or
> >>>> can
> >>>>>> we
> >>>>>>>>> build
> >>>>>>>>>> arbitrary
> >>>>>>>>>>>> indexes with this new specification,
> >>>> independent of
> >>>>>>>>> sensor?
> >>>>>>>>>> Should
> >>>>>>>>>>> there
> >>>>>>>>>>>> actually be a list of “indexes”, ie
> >>>>>>>>>>>> { “indexes” : [
> >>>>>>>>>>>> {“index” : “name1”,
> >>>>>>>>>>>> …
> >>>>>>>>>>>> },
> >>>>>>>>>>>> {“index” : “name2”,
> >>>>>>>>>>>> …
> >>>>>>>>>>>> } ]
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. Would the filtering / writer selection
> >>> logic
> >>>>>> take
> >>>>>>>>> place in
> >>>>>>>>>> the
> >>>>>>>>>>> indexing
> >>>>>>>>>>>> topology splitter bolt? Seems like that
> >> would
> >>>> have
> >>>>>>> the
> >>>>>>>>>> smallest
> >>>>>>>>>>> impact on
> >>>>>>>>>>>> current implementation, no?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Sorry if these are already answered in
> >>> PR-415, I
> >>>>>>>> haven’t
> >>>>>>>>> had
> >>>>>>>>>> time to
> >>>>>>>>>>>> review that one yet.
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> --Matt
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 1/12/17, 12:55 PM, "Michael Miklavcic" <
> >>>>>>>>>>> michael.miklav...@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> I like the flexibility and
> >> expressibility
> >>> of
> >>>>>> the
> >>>>>>>>> first
> >>>>>>>>>> option
> >>>>>>>>>>> with
> >>>>>>>>>>>> Stellar
> >>>>>>>>>>>> filters.
> >>>>>>>>>>>>
> >>>>>>>>>>>> M
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Jan 12, 2017 at 1:51 PM, Casey
> >>>> Stella <
> >>>>>>>>>>> ceste...@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> As of METRON-652 <
> >>>> https://github.com/apache/
> >>>>>>>>>>>> incubator-metron/pull/415>, we
> >>>>>>>>>>>>> will have decoupled the indexing
> >>>>>> configuration
> >>>>>>>>> from the
> >>>>>>>>>>> enrichment
> >>>>>>>>>>>>> configuration. As an immediate
> >>> follow-up
> >>>> to
> >>>>>>>> that,
> >>>>>>>>> I'd
> >>>>>>>>>> like to
> >>>>>>>>>>>> provide the
> >>>>>>>>>>>>> ability to turn off and on writers via
> >>> the
> >>>>>>>>> configs. I'd
> >>>>>>>>>> like
> >>>>>>>>>>> to get
> >>>>>>>>>>>> some
> >>>>>>>>>>>>> community feedback on how the
> >>>> functionality
> >>>>>>>> should
> >>>>>>>>> work,
> >>>>>>>>>> if
> >>>>>>>>>>> y'all are
> >>>>>>>>>>>>> amenable. :)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> As of now, we have 3 possible writers
> >>>> which
> >>>>>> can
> >>>>>>>> be
> >>>>>>>>> used
> >>>>>>>>>> in the
> >>>>>>>>>>>> indexing
> >>>>>>>>>>>>> topology:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - Solr
> >>>>>>>>>>>>> - Elasticsearch
> >>>>>>>>>>>>> - HDFS
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> HDFS is always used, elasticsearch or
> >>>> solr is
> >>>>>>>> used
> >>>>>>>>>> depending
> >>>>>>>>>>> on how
> >>>>>>>>>>>> you
> >>>>>>>>>>>>> start the indexing topology.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> A couple of proposals come to mind
> >>>>>> immediately:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> *Index Filtering*
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> You would be able to specify a filter
> >> as
> >>>>>>> defined
> >>>>>>>>> by a
> >>>>>>>>>> stellar
> >>>>>>>>>>>> statement
> >>>>>>>>>>>>> (likely a reuse of the StellarFilter
> >>> that
> >>>>>>> exists
> >>>>>>>>> in the
> >>>>>>>>>>> Parsers)
> >>>>>>>>>>>> which
> >>>>>>>>>>>>> would allow you to indicate on a
> >>>>>>>>> message-by-message basis
> >>>>>>>>>>> whether or
> >>>>>>>>>>>> not to
> >>>>>>>>>>>>> write the message.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The semantics of this would be as
> >>> follows:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - Default (i.e. unspecified) is to
> >>> pass
> >>>>>>>>> everything
> >>>>>>>>>> through
> >>>>>>>>>>> (hence
> >>>>>>>>>>>>> backwards compatible with the
> >> current
> >>>>>>> default
> >>>>>>>>> config).
> >>>>>>>>>>>>> - Messages which have the
> >> associated
> >>>>>> stellar
> >>>>>>>>> statement
> >>>>>>>>>>> evaluate
> >>>>>>>>>>>> to true
> >>>>>>>>>>>>> for the writer type will be
> >> written,
> >>>>>>> otherwise
> >>>>>>>>> not.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sample indexing config which would
> >> write
> >>>> out
> >>>>>> no
> >>>>>>>>> messages
> >>>>>>>>>> to
> >>>>>>>>>>> HDFS and
> >>>>>>>>>>>> write
> >>>>>>>>>>>>> out only messages containing a field
> >>>> called
> >>>>>>>>> "field1":
> >>>>>>>>>>>>> {
> >>>>>>>>>>>>> "index" : "squid"
> >>>>>>>>>>>>> ,"batchSize" : 100
> >>>>>>>>>>>>> ,"filters" : {
> >>>>>>>>>>>>> "HDFS" : "false"
> >>>>>>>>>>>>> ,"ES" : "exists(field1)"
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> *Index On/Off Switch*
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> A simpler solution would be to just
> >>>> provide a
> >>>>>>>> list
> >>>>>>>>> of
> >>>>>>>>>> writers
> >>>>>>>>>>> to
> >>>>>>>>>>>> write
> >>>>>>>>>>>>> messages. The semantics would be as
> >>>> follows:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - If the list is unspecified, then
> >>> the
> >>>>>>> default
> >>>>>>>>> is to
> >>>>>>>>>> write
> >>>>>>>>>>> all
> >>>>>>>>>>>> messages
> >>>>>>>>>>>>> for every writer in the indexing
> >>>> topology
> >>>>>>>>>>>>> - If the list is specified, then a
> >>>> writer
> >>>>>>> will
> >>>>>>>>> write
> >>>>>>>>>> all
> >>>>>>>>>>> messages
> >>>>>>>>>>>> if and
> >>>>>>>>>>>>> only if it is named in the list.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sample indexing config which turns off
> >>>> HDFS
> >>>>>> and
> >>>>>>>>> keeps on
> >>>>>>>>>>>> Elasticsearch:
> >>>>>>>>>>>>> {
> >>>>>>>>>>>>> "index" : "squid"
> >>>>>>>>>>>>> ,"batchSize" : 100
> >>>>>>>>>>>>> ,"writers" : [ "ES" ]
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks in advance for the feedback!
> >>>> Also, if
> >>>>>>> you
> >>>>>>>>> have
> >>>>>>>>>> any
> >>>>>>>>>>> other,
> >>>>>>>>>>>> better
> >>>>>>>>>>>>> ideas than the ones presented here,
> >> let
> >>> me
> >>>>>> know
> >>>>>>>>> too.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Casey
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Nick Allen <n...@nickallen.org>
> >>>>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Nick Allen <n...@nickallen.org>
> >>
>
>

-- 

Jon

Sent from my mobile device

Re: [DISCUSS] Turning off indexing writers feature discussion

Reply via email to