The original string serves purposes well beyond debugging. Many users will
need to be able to prove provenance to the raw logs in order to prove or
prosecute an attack from an internal threat, or provide evidence to law
enforcement or an external threat. As such, the original string is
important.

It also provides a valuable source for the free text search where parsing
has not extracted all the necessary tokens for a hunt use case, so it can
be a valuable field to have in Elastic or Solr for text rather than keyword
indexing.

That said, it may make sense to remove a heavy weight processing and
storage field like this from the lucene store. We have been talking for a
while about filtering some of the data out of the realtime index, and
preserving full copies in the batch index, which could meet the forensic
use cases above, and would make it a matter of user choice. That would
probably be configured through indexing config to filter fields.

Simon

On 25 June 2018 at 23:43, Michel Sumbul <michelsum...@gmail.com> wrote:

> Depending on the source of data, it might be interesting to bypass a step
> that the user concider useless.
> For example if you have a source of data that dont need profiling and you
> want to have it ingested like the other source to allow the  SOC analyst to
> use it in there analysis. To have everything at the same place.
>
> How can we bypass it for a specific sensor?
>
> 2018-06-25 23:38 GMT+01:00 James Sirota <jsir...@apache.org>:
>
> > There is a way to wire the system to bypass enrichment and profiling, but
> > you would then bypass a lot of key features of the system.  It would be
> > unwise to do that.
> >
> > 25.06.2018, 15:13, "Michel Sumbul" <michelsum...@gmail.com>:
> > > Hi Casey,
> > >
> > > Thats make completely sense.
> > > Short question, if there is no enrichment or no profiling, does the
> > message
> > > still pass through the enrichment/profiling topic?
> > >
> > > If yes, do you think its possible to imagine a way that for messages
> that
> > > doesn't need enrichment or profiling to skip the topic and to go
> directly
> > > to the next one? This is again to avoid in/out in kafka.
> > >
> > > Thanks for the explaination,
> > > Michel
> > >
> > > 2018-06-23 3:58 GMT+01:00 Casey Stella <ceste...@gmail.com>:
> > >
> > >>  Hey Michel,
> > >>
> > >>  Those are good questions and there were some reasons surrounding
> that.
> > In
> > >>  fact, historically, we had fewer topologies (e.g. indexing and
> > enrichment
> > >>  were merged). Even earlier on, we had just one giant topology per
> > parser
> > >>  that enriched and indexed. The long story short is that we moved this
> > way
> > >>  because we saw how people were using metron and we gained more
> insight
> > >>  tuning Metron. That led us down this architectural path.
> > >>
> > >>  Some of the reasons that we went this way:
> > >>
> > >>     - Fewer large topologies were a nightmare to tune
> > >>        - Enrichment would have different memory requirements than,
> say,
> > >>        parsers or indexing
> > >>        - You can adjust the kafka topic params per topology to adjust
> > the
> > >>        number of partitions, etc.
> > >>     - Having the separate topologies gives a natural set of extension
> > points
> > >>     for customization and enhancement (e.g. you want a phase between
> > parsing
> > >>     and enrichment).
> > >>     - Decoupling the topologies lets us spin up and down parts of
> Metron
> > >>     without affecting others (e.g. you don't have to take down
> > enrichments
> > >>  to
> > >>     add a parser, even for a moment)
> > >>     - The movement to Flux meant we were limited in how much we could
> > adjust
> > >>     the topology at runtime (e.g. colocating parsers and enrichment
> > would
> > >>  mean
> > >>     moving away from flux essentially as the topology changes its
> > structure)
> > >>
> > >>  Best,
> > >>
> > >>  Casey
> > >>
> > >>  On Fri, Jun 22, 2018 at 5:25 PM Michel Sumbul <
> michelsum...@gmail.com>
> > >>  wrote:
> > >>
> > >>  > Hi Everyone,
> > >>  >
> > >>  > I was asking myself what was the architectural reason to split the
> > >>  > ingestion in metron in 4 differents toppologies that all read/write
> > to
> > >>  > kafka?
> > >>  >
> > >>  > For example, why the parsing and enrichment topologies have not
> been
> > >>  > merged? Would it not be possible when you parse the message to
> > directly
> > >>  > enricht it?
> > >>  >
> > >>  > Im asking that because splitting in several topologies means that
> > all of
> > >>  > the topologies read/write to Kafka, which produce a bigger load on
> > the
> > >>  > kafka cluster and then a need for way more infrastructure/servers.
> > The
> > >>  cost
> > >>  > is especially true when we speak about TBs of data ingested every
> > day.
> > >>  >
> > >>  > Im sure there were a very good reason, I was just curious.
> > >>  >
> > >>  > Thanks,
> > >>  > Michel
> > >>  >
> >
> > -------------------
> > Thank you,
> >
> > James Sirota
> > PMC- Apache Metron
> > jsirota AT apache DOT org
> >
> >
>



-- 
--
simon elliston ball
@sireb

Reply via email to