Re: [DISCUSS] Metron assessment tool

Nick Allen Fri, 15 Apr 2016 11:01:25 -0700

I definitely agree that you need this level of understanding of your
cluster.  It definitely could work the way that you describe.


I was thinking of it slightly differently though.  The metrics for this
purpose (understanding performance of existing cluster) should come from
the actual sensors themselves.  For example, I need to instrument the
packet capture process so that it kicks out time-series-ish metrics that
you can monitor in a dashboard over time.

On Fri, Apr 15, 2016 at 1:40 PM, [email protected] <[email protected]> wrote:

> However, it would be handy to have something like this perpetually running
> so you know when to scale up/out/down/in a cluster.
>
> On Fri, Apr 15, 2016, 13:35 Nick Allen <[email protected]> wrote:
>
> > I think it is slightly different.  I don't even want to install minimal
> > Kafka infrastructure (Look ma, no Kafka!)
> >
> > The exact implementation would differ based on the data inputs that you
> are
> > trying to measure, but for example...
> >
> >    - To understand raw packet rates I would have a specialized sensor
> that
> >    counts packets and size on the wire.  It doesn't do anything more than
> > that.
> >    - To understand Netflow rates, it would watch for Netflow packets and
> >    count those.
> >    - To understand sizing around application logs, a sensor would watch
> for
> >    Syslog packets and count those.
> >
> > The implementation would be more similar to raw packet capture with some
> > DPI.  No Hadoop-y components required.
> >
> >
> >
> > On Fri, Apr 15, 2016 at 1:10 PM, James Sirota <[email protected]>
> > wrote:
> >
> > > So this is exactly what I am proposing.  Calculate the metrics on the
> fly
> > > without landing any data in the cluster.  The problem is that that
> > > enterprise data volumes are so large you can’t just point them at a
> Java
> > or
> > > a C++ program or sensor.  You either need an existing minimal Kafka
> > > infrastructure to take that load or sample the data.
> > >
> > > Thanks,
> > > James
> > >
> > >
> > >
> > >
> > > On 4/15/16, 9:54 AM, "Nick Allen" <[email protected]> wrote:
> > >
> > > >Or we have the assessment tool not actually land any data.  The
> > assessment
> > > >tool becomes a 'sensor' in its own right.  You just point the input
> data
> > > >sets at the assessment tool, it builds metrics on the input (for
> > example:
> > > >count the number of packets per second) and then we use those metrics
> to
> > > >estimate cluster size.
> > > >
> > > >On Wed, Apr 13, 2016 at 5:45 PM, James Sirota <
> [email protected]>
> > > >wrote:
> > > >
> > > >> That’s an excellent point.  So I think there are three ways forward.
> > > >>
> > > >> One is we can assume that there has to be at least a minimal
> > > >> infrastructure in place (at least a subset of Kafka and Storm
> > > resources) to
> > > >> run a full-scale assessment.  If you point something that blasts
> > > millions
> > > >> of messages per second at something like ActiveMQ you are going to
> > blow
> > > >> up.  So the infrastructure to at least receive these kinds of
> message
> > > >> volumes has to exist as a pre-requisite. There is no way to get
> around
> > > that.
> > > >>
> > > >> The second approach I see is sampling.  Sampling is a lot less
> precise
> > > and
> > > >> you can miss peaks that fall outside of your sampling windows.  But
> > the
> > > >> obvious benefit is that you don’t need a cluster to process these
> > > streams.
> > > >> You can probably perform most of your calculations with a
> > multithreaded
> > > >> java program.  Sampling poses a few design challenges.  First, where
> > do
> > > you
> > > >> sample?  Do you sample on the sensor? (the implication here is that
> we
> > > have
> > > >> to program some sort of sampling capability in our sensors) . Do you
> > > sample
> > > >> on transport? (maybe a Flume interceptor or NiFi processor).  There
> is
> > > also
> > > >> a question of what the sampling rate should be.  Not knowing
> > statistical
> > > >> properties of a stream ahead of time it’s hard to make that call.
> > > >>
> > > >> The third option I think is MR job.  We can blast the data into HDFS
> > and
> > > >> then go over it with MR to derive the metrics we are looking for.
> > Then
> > > we
> > > >> don’t have to sample or setup expensive infrastructure to receive a
> > > deluge
> > > >> of data.  But then we run into the chicken and the egg problem that
> in
> > > >> order to size your HDFS you need to have data in HDFS.  Ideally you
> > > need to
> > > >> capture at least one full weeks worth of logs because patterns
> > > throughout
> > > >> the day as well as every day of the week have different statistical
> > > >> properties.  So you need off peak, on peak, weekdays and weekends to
> > > derive
> > > >> these stats in batch.
> > > >>
> > > >> Any other design ideas?
> > > >>
> > > >> Thanks,
> > > >> James
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On 4/13/16, 1:59 PM, "Nick Allen" <[email protected]> wrote:
> > > >>
> > > >> >If the tool starts at Kafka, the user would have to already have
> > > committed
> > > >> >to the investment in the infrastructure and time to setup the
> sensors
> > > that
> > > >> >feed Kafka and Kafka itself.  Maybe it would need to be further
> > > upstream?
> > > >> >On Apr 13, 2016 1:05 PM, "James Sirota" <[email protected]>
> > > wrote:
> > > >> >
> > > >> >> Hi Goerge,
> > > >> >>
> > > >> >> This article defines micro-tuning of the existing cluster.  What
> I
> > am
> > > >> >> proposing is a level up from that.  When you start with Metron
> how
> > do
> > > >> you
> > > >> >> even know how many nodes you need?  And of these nodes how many
> do
> > > you
> > > >> >> allocate to Storm, indexing, storage?  How much storage do you
> > need?
> > > >> >> Tuning would be the next step in the process, but this tool would
> > > answer
> > > >> >> more fundamental questions about what a Metron deployment should
> > look
> > > >> like
> > > >> >> given the number of telemetries and retention policies of the
> > > >> enterprise.
> > > >> >>
> > > >> >> The best way to get this data (in my opinion) is to have some
> tool
> > > that
> > > >> we
> > > >> >> can plug into Metron’s point of ingest (kafka topics) and run
> that
> > > for
> > > >> >> about a week or a month to be able to figure that out and spit
> out
> > > these
> > > >> >> relevant metrics.  Based on these metrics we can figure out the
> > > >> fundamental
> > > >> >> things about what metron should look like.  Tuning would be the
> > next
> > > >> step.
> > > >> >>
> > > >> >> Thanks,
> > > >> >> James
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On 4/13/16, 9:52 AM, "George Vetticaden" <
> > > [email protected]>
> > > >> >> wrote:
> > > >> >>
> > > >> >> >I have used the following Kafka and Storm Best Practices guide
> at
> > > >> numerous
> > > >> >> >customer implementations.
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> https://community.hortonworks.com/articles/550/unofficial-storm-and-kafka-b
> > > >> >> >est-practices-guide.html
> > > >> >> >
> > > >> >> >
> > > >> >> >We need to have something similar and prescriptive for Metron
> > based
> > > on:
> > > >> >> >1. What data sources are we enabling
> > > >> >> >2. What enrichment services are we enabling
> > > >> >> >3. What threat intel services are we enabling
> > > >> >> >4. What are we indexing into Solr/Elastic and how long
> > > >> >> >5. What are we persisting into HDFS..
> > > >> >> >
> > > >> >> >Ideally, the The metron assessment tool combined with an
> > > introspection
> > > >> of
> > > >> >> >the user’s  ansible configuration should drive what ambari
> > blueprint
> > > >> type
> > > >> >> >and configuration should be used when the cluster is spun up and
> > the
> > > >> storm
> > > >> >> >topology is deployed.
> > > >> >> >
> > > >> >> >
> > > >> >> >--
> > > >> >> >George VetticadenPrincipal, COE
> > > >> >> >[email protected]
> > > >> >> >(630) 909-9138
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >On 4/13/16, 11:40 AM, "George Vetticaden" <
> > > [email protected]
> > > >> >
> > > >> >> >wrote:
> > > >> >> >
> > > >> >> >>+ 1 to James suggestion.
> > > >> >> >>We also need to consider not just the data volume and storage
> > > >> >> requirements
> > > >> >> >>for proper cluster sizing but also processing requirements as
> > well.
> > > >> Given
> > > >> >> >>that in the new architecture, we have moved to single
> enrichment
> > > >> topology
> > > >> >> >>that will support all data sources, proper sizing of the
> > enrichment
> > > >> >> >>topology  will be even more crucial to maintain SLAs and HA
> > > >> requirements.
> > > >> >> >>The following key questions will apply to each parser topology
> > and
> > > >> single
> > > >> >> >>enrichment topology
> > > >> >> >>
> > > >> >> >>1. Number of workers?
> > > >> >> >>2. Number of workers per machine?
> > > >> >> >>3. Size of each workers (in memory)?
> > > >> >> >>4. Supervisor memory settings
> > > >> >> >>
> > > >> >> >>The assessment tool should also be used to size topologies
> > > correctly
> > > >> as
> > > >> >> >>well.
> > > >> >> >>
> > > >> >> >>Tuning Kafka, Hbase and Solr/Elastic should also be governed by
> > the
> > > >> >> Metron
> > > >> >> >>assessment tool.
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>--
> > > >> >> >>George Vetticaden
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>
> > > >> >> >>On 4/13/16, 11:28 AM, "James Sirota" <[email protected]>
> > > wrote:
> > > >> >> >>
> > > >> >> >>>Prior to adoption of Metron each adopting entity needs to
> > > guesstimate
> > > >> >> >>>it¹s data volume and data storage requirements so they can
> size
> > > their
> > > >> >> >>>cluster properly.  I propose a creation of an assessment tool
> > that
> > > >> can
> > > >> >> >>>plug in to a Kafka topic for a given telemetry and over time
> > > produce
> > > >> >> >>>statistics for ingest volumes and storage requirement.  The
> idea
> > > is
> > > >> that
> > > >> >> >>>prior to adoption of Metron someone can set up all the feeds
> and
> > > >> kafka
> > > >> >> >>>topics, but instead of deploying Metron right away they would
> > > deploy
> > > >> >> this
> > > >> >> >>>tool.  This tool would then produce statistics for data
> > > >> ingest/storage
> > > >> >> >>>requirement, and all relevant information needed for cluster
> > > sizing.
> > > >> >> >>>
> > > >> >> >>>Some of the metrics that can be recorded are:
> > > >> >> >>>
> > > >> >> >>>  *   Number of system events per second (average, max, mean,
> > > >> standard
> > > >> >> >>>dev)
> > > >> >> >>>  *   Message size  (average, max, mean, standard dev)
> > > >> >> >>>  *   Average number of peaks
> > > >> >> >>>  *   Duration of peaks  (average, max, mean, standard dev)
> > > >> >> >>>
> > > >> >> >>>If the parser for a telemetry exist the tool can produce
> > > additional
> > > >> >> >>>statistics
> > > >> >> >>>
> > > >> >> >>>  *   Number of keys/fields parsed (average, max, mean,
> standard
> > > dev)
> > > >> >> >>>  *   Length of field parsed (average, max, mean, standard
> dev)
> > > >> >> >>>  *   Length of key parsed (average, max, mean, standard dev)
> > > >> >> >>>
> > > >> >> >>>The tool can run for a week or a month and produce these kinds
> > of
> > > >> >> >>>statistics.  Then once the statistics are available we can
> come
> > up
> > > >> with
> > > >> >> a
> > > >> >> >>>guidance documentation of recommended cluster setup.
> Otherwise
> > > it¹s
> > > >> >> hard
> > > >> >> >>>to properly size a cluster and setup streaming parallelism not
> > > >> knowing
> > > >> >> >>>these metrics.
> > > >> >> >>>
> > > >> >> >>>
> > > >> >> >>>Thoughts/ideas?
> > > >> >> >>>
> > > >> >> >>>Thanks,
> > > >> >> >>>James
> > > >> >> >>
> > > >> >> >>
> > > >> >> >
> > > >> >>
> > > >>
> > > >
> > > >
> > > >
> > > >--
> > > >Nick Allen <[email protected]>
> > >
> >
> >
> >
> > --
> > Nick Allen <[email protected]>
> >
> --
>
> Jon
>



-- 
Nick Allen <[email protected]>

Re: [DISCUSS] Metron assessment tool

Reply via email to