Re: [DISCUSS] Metron assessment tool

James Sirota Tue, 12 Jul 2016 14:54:03 -0700

Per the Apache Way it would be desirable to put forth an architecture proposal 
together to have the community take a look at it before implementing.  I would 
propose to have a simple storm topology that attaches to a kafka topic and 
records statistics such as # of messages and total throughput for n-sized time 
bins.  What specific requirements do you have in mind?


Thanks,
James 

12.07.2016, 14:41, "[email protected]" <[email protected]>:
> I can definitely give it a shot. A kickstart would be appreciated.
>
> Jom
>
> On Tue, Jul 12, 2016, 17:17 James Sirota <[email protected]> wrote:
>
>>  John,
>>
>>  Just field METRON-318. Is this something you would like to work on?
>>  Would you like help from us to get started?
>>
>>  Thanks,
>>  James
>>
>>  12.07.2016, 11:53, "[email protected]" <[email protected]>:
>>  > Hi All,
>>  >
>>  > Has there been any additional discussion or development regarding this? I
>>  > did take a brief look around the jira and didn't see anything regarding
>>  > this, but I may have missed it. Thanks,
>>  >
>>  > Jon
>>  >
>>  > On Fri, Apr 15, 2016 at 2:01 PM Nick Allen <[email protected]> wrote:
>>  >
>>  >> I definitely agree that you need this level of understanding of your
>>  >> cluster. It definitely could work the way that you describe.
>>  >>
>>  >> I was thinking of it slightly differently though. The metrics for this
>>  >> purpose (understanding performance of existing cluster) should come
>>  from
>>  >> the actual sensors themselves. For example, I need to instrument the
>>  >> packet capture process so that it kicks out time-series-ish metrics
>>  that
>>  >> you can monitor in a dashboard over time.
>>  >>
>>  >> On Fri, Apr 15, 2016 at 1:40 PM, [email protected] <[email protected]>
>>  >> wrote:
>>  >>
>>  >> > However, it would be handy to have something like this perpetually
>>  >> running
>>  >> > so you know when to scale up/out/down/in a cluster.
>>  >> >
>>  >> > On Fri, Apr 15, 2016, 13:35 Nick Allen <[email protected]> wrote:
>>  >> >
>>  >> > > I think it is slightly different. I don't even want to install
>>  minimal
>>  >> > > Kafka infrastructure (Look ma, no Kafka!)
>>  >> > >
>>  >> > > The exact implementation would differ based on the data inputs
>>  that you
>>  >> > are
>>  >> > > trying to measure, but for example...
>>  >> > >
>>  >> > > - To understand raw packet rates I would have a specialized sensor
>>  >> > that
>>  >> > > counts packets and size on the wire. It doesn't do anything more
>>  >> than
>>  >> > > that.
>>  >> > > - To understand Netflow rates, it would watch for Netflow packets
>>  >> and
>>  >> > > count those.
>>  >> > > - To understand sizing around application logs, a sensor would
>>  watch
>>  >> > for
>>  >> > > Syslog packets and count those.
>>  >> > >
>>  >> > > The implementation would be more similar to raw packet capture with
>>  >> some
>>  >> > > DPI. No Hadoop-y components required.
>>  >> > >
>>  >> > >
>>  >> > >
>>  >> > > On Fri, Apr 15, 2016 at 1:10 PM, James Sirota <
>>  [email protected]
>>  >> >
>>  >> > > wrote:
>>  >> > >
>>  >> > > > So this is exactly what I am proposing. Calculate the metrics on
>>  the
>>  >> > fly
>>  >> > > > without landing any data in the cluster. The problem is that that
>>  >> > > > enterprise data volumes are so large you can’t just point them
>>  at a
>>  >> > Java
>>  >> > > or
>>  >> > > > a C++ program or sensor. You either need an existing minimal
>>  Kafka
>>  >> > > > infrastructure to take that load or sample the data.
>>  >> > > >
>>  >> > > > Thanks,
>>  >> > > > James
>>  >> > > >
>>  >> > > >
>>  >> > > >
>>  >> > > >
>>  >> > > > On 4/15/16, 9:54 AM, "Nick Allen" <[email protected]> wrote:
>>  >> > > >
>>  >> > > > >Or we have the assessment tool not actually land any data. The
>>  >> > > assessment
>>  >> > > > >tool becomes a 'sensor' in its own right. You just point the
>>  input
>>  >> > data
>>  >> > > > >sets at the assessment tool, it builds metrics on the input (for
>>  >> > > example:
>>  >> > > > >count the number of packets per second) and then we use those
>>  >> metrics
>>  >> > to
>>  >> > > > >estimate cluster size.
>>  >> > > > >
>>  >> > > > >On Wed, Apr 13, 2016 at 5:45 PM, James Sirota <
>>  >> > [email protected]>
>>  >> > > > >wrote:
>>  >> > > > >
>>  >> > > > >> That’s an excellent point. So I think there are three ways
>>  >> forward.
>>  >> > > > >>
>>  >> > > > >> One is we can assume that there has to be at least a minimal
>>  >> > > > >> infrastructure in place (at least a subset of Kafka and Storm
>>  >> > > > resources) to
>>  >> > > > >> run a full-scale assessment. If you point something that
>>  blasts
>>  >> > > > millions
>>  >> > > > >> of messages per second at something like ActiveMQ you are
>>  going to
>>  >> > > blow
>>  >> > > > >> up. So the infrastructure to at least receive these kinds of
>>  >> > message
>>  >> > > > >> volumes has to exist as a pre-requisite. There is no way to
>>  get
>>  >> > around
>>  >> > > > that.
>>  >> > > > >>
>>  >> > > > >> The second approach I see is sampling. Sampling is a lot less
>>  >> > precise
>>  >> > > > and
>>  >> > > > >> you can miss peaks that fall outside of your sampling windows.
>>  >> But
>>  >> > > the
>>  >> > > > >> obvious benefit is that you don’t need a cluster to process
>>  these
>>  >> > > > streams.
>>  >> > > > >> You can probably perform most of your calculations with a
>>  >> > > multithreaded
>>  >> > > > >> java program. Sampling poses a few design challenges. First,
>>  >> where
>>  >> > > do
>>  >> > > > you
>>  >> > > > >> sample? Do you sample on the sensor? (the implication here is
>>  >> that
>>  >> > we
>>  >> > > > have
>>  >> > > > >> to program some sort of sampling capability in our sensors) .
>>  Do
>>  >> you
>>  >> > > > sample
>>  >> > > > >> on transport? (maybe a Flume interceptor or NiFi processor).
>>  >> There
>>  >> > is
>>  >> > > > also
>>  >> > > > >> a question of what the sampling rate should be. Not knowing
>>  >> > > statistical
>>  >> > > > >> properties of a stream ahead of time it’s hard to make that
>>  call.
>>  >> > > > >>
>>  >> > > > >> The third option I think is MR job. We can blast the data into
>>  >> HDFS
>>  >> > > and
>>  >> > > > >> then go over it with MR to derive the metrics we are looking
>>  for.
>>  >> > > Then
>>  >> > > > we
>>  >> > > > >> don’t have to sample or setup expensive infrastructure to
>>  receive
>>  >> a
>>  >> > > > deluge
>>  >> > > > >> of data. But then we run into the chicken and the egg problem
>>  >> that
>>  >> > in
>>  >> > > > >> order to size your HDFS you need to have data in HDFS. Ideally
>>  >> you
>>  >> > > > need to
>>  >> > > > >> capture at least one full weeks worth of logs because patterns
>>  >> > > > throughout
>>  >> > > > >> the day as well as every day of the week have different
>>  >> statistical
>>  >> > > > >> properties. So you need off peak, on peak, weekdays and
>>  weekends
>>  >> to
>>  >> > > > derive
>>  >> > > > >> these stats in batch.
>>  >> > > > >>
>>  >> > > > >> Any other design ideas?
>>  >> > > > >>
>>  >> > > > >> Thanks,
>>  >> > > > >> James
>>  >> > > > >>
>>  >> > > > >>
>>  >> > > > >>
>>  >> > > > >>
>>  >> > > > >>
>>  >> > > > >> On 4/13/16, 1:59 PM, "Nick Allen" <[email protected]> wrote:
>>  >> > > > >>
>>  >> > > > >> >If the tool starts at Kafka, the user would have to already
>>  have
>>  >> > > > committed
>>  >> > > > >> >to the investment in the infrastructure and time to setup the
>>  >> > sensors
>>  >> > > > that
>>  >> > > > >> >feed Kafka and Kafka itself. Maybe it would need to be
>>  further
>>  >> > > > upstream?
>>  >> > > > >> >On Apr 13, 2016 1:05 PM, "James Sirota" <
>>  [email protected]
>>  >> >
>>  >> > > > wrote:
>>  >> > > > >> >
>>  >> > > > >> >> Hi Goerge,
>>  >> > > > >> >>
>>  >> > > > >> >> This article defines micro-tuning of the existing cluster.
>>  >> What
>>  >> > I
>>  >> > > am
>>  >> > > > >> >> proposing is a level up from that. When you start with
>>  Metron
>>  >> > how
>>  >> > > do
>>  >> > > > >> you
>>  >> > > > >> >> even know how many nodes you need? And of these nodes how
>>  many
>>  >> > do
>>  >> > > > you
>>  >> > > > >> >> allocate to Storm, indexing, storage? How much storage do
>>  you
>>  >> > > need?
>>  >> > > > >> >> Tuning would be the next step in the process, but this tool
>>  >> would
>>  >> > > > answer
>>  >> > > > >> >> more fundamental questions about what a Metron deployment
>>  >> should
>>  >> > > look
>>  >> > > > >> like
>>  >> > > > >> >> given the number of telemetries and retention policies of
>>  the
>>  >> > > > >> enterprise.
>>  >> > > > >> >>
>>  >> > > > >> >> The best way to get this data (in my opinion) is to have
>>  some
>>  >> > tool
>>  >> > > > that
>>  >> > > > >> we
>>  >> > > > >> >> can plug into Metron’s point of ingest (kafka topics) and
>>  run
>>  >> > that
>>  >> > > > for
>>  >> > > > >> >> about a week or a month to be able to figure that out and
>>  spit
>>  >> > out
>>  >> > > > these
>>  >> > > > >> >> relevant metrics. Based on these metrics we can figure out
>>  the
>>  >> > > > >> fundamental
>>  >> > > > >> >> things about what metron should look like. Tuning would be
>>  the
>>  >> > > next
>>  >> > > > >> step.
>>  >> > > > >> >>
>>  >> > > > >> >> Thanks,
>>  >> > > > >> >> James
>>  >> > > > >> >>
>>  >> > > > >> >>
>>  >> > > > >> >>
>>  >> > > > >> >>
>>  >> > > > >> >> On 4/13/16, 9:52 AM, "George Vetticaden" <
>>  >> > > > [email protected]>
>>  >> > > > >> >> wrote:
>>  >> > > > >> >>
>>  >> > > > >> >> >I have used the following Kafka and Storm Best Practices
>>  guide
>>  >> > at
>>  >> > > > >> numerous
>>  >> > > > >> >> >customer implementations.
>>  >> > > > >> >> >
>>  >> > > > >> >>
>>  >> > > > >>
>>  >> > > >
>>  >> > >
>>  >> >
>>  >>
>>  https://community.hortonworks.com/articles/550/unofficial-storm-and-kafka-b
>>  >> > > > >> >> >est-practices-guide.html
>>  >> > > > >> >> >
>>  >> > > > >> >> >
>>  >> > > > >> >> >We need to have something similar and prescriptive for
>>  Metron
>>  >> > > based
>>  >> > > > on:
>>  >> > > > >> >> >1. What data sources are we enabling
>>  >> > > > >> >> >2. What enrichment services are we enabling
>>  >> > > > >> >> >3. What threat intel services are we enabling
>>  >> > > > >> >> >4. What are we indexing into Solr/Elastic and how long
>>  >> > > > >> >> >5. What are we persisting into HDFS..
>>  >> > > > >> >> >
>>  >> > > > >> >> >Ideally, the The metron assessment tool combined with an
>>  >> > > > introspection
>>  >> > > > >> of
>>  >> > > > >> >> >the user’s ansible configuration should drive what ambari
>>  >> > > blueprint
>>  >> > > > >> type
>>  >> > > > >> >> >and configuration should be used when the cluster is spun
>>  up
>>  >> and
>>  >> > > the
>>  >> > > > >> storm
>>  >> > > > >> >> >topology is deployed.
>>  >> > > > >> >> >
>>  >> > > > >> >> >
>>  >> > > > >> >> >--
>>  >> > > > >> >> >George VetticadenPrincipal, COE
>>  >> > > > >> >> >[email protected]
>>  >> > > > >> >> >(630) 909-9138
>>  >> > > > >> >> >
>>  >> > > > >> >> >
>>  >> > > > >> >> >
>>  >> > > > >> >> >
>>  >> > > > >> >> >
>>  >> > > > >> >> >On 4/13/16, 11:40 AM, "George Vetticaden" <
>>  >> > > > [email protected]
>>  >> > > > >> >
>>  >> > > > >> >> >wrote:
>>  >> > > > >> >> >
>>  >> > > > >> >> >>+ 1 to James suggestion.
>>  >> > > > >> >> >>We also need to consider not just the data volume and
>>  storage
>>  >> > > > >> >> requirements
>>  >> > > > >> >> >>for proper cluster sizing but also processing
>>  requirements as
>>  >> > > well.
>>  >> > > > >> Given
>>  >> > > > >> >> >>that in the new architecture, we have moved to single
>>  >> > enrichment
>>  >> > > > >> topology
>>  >> > > > >> >> >>that will support all data sources, proper sizing of the
>>  >> > > enrichment
>>  >> > > > >> >> >>topology will be even more crucial to maintain SLAs and
>>  HA
>>  >> > > > >> requirements.
>>  >> > > > >> >> >>The following key questions will apply to each parser
>>  >> topology
>>  >> > > and
>>  >> > > > >> single
>>  >> > > > >> >> >>enrichment topology
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>1. Number of workers?
>>  >> > > > >> >> >>2. Number of workers per machine?
>>  >> > > > >> >> >>3. Size of each workers (in memory)?
>>  >> > > > >> >> >>4. Supervisor memory settings
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>The assessment tool should also be used to size
>>  topologies
>>  >> > > > correctly
>>  >> > > > >> as
>>  >> > > > >> >> >>well.
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>Tuning Kafka, Hbase and Solr/Elastic should also be
>>  governed
>>  >> by
>>  >> > > the
>>  >> > > > >> >> Metron
>>  >> > > > >> >> >>assessment tool.
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>--
>>  >> > > > >> >> >>George Vetticaden
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>On 4/13/16, 11:28 AM, "James Sirota" <
>>  >> [email protected]>
>>  >> > > > wrote:
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>>Prior to adoption of Metron each adopting entity needs
>>  to
>>  >> > > > guesstimate
>>  >> > > > >> >> >>>it¹s data volume and data storage requirements so they
>>  can
>>  >> > size
>>  >> > > > their
>>  >> > > > >> >> >>>cluster properly. I propose a creation of an assessment
>>  >> tool
>>  >> > > that
>>  >> > > > >> can
>>  >> > > > >> >> >>>plug in to a Kafka topic for a given telemetry and over
>>  time
>>  >> > > > produce
>>  >> > > > >> >> >>>statistics for ingest volumes and storage requirement.
>>  The
>>  >> > idea
>>  >> > > > is
>>  >> > > > >> that
>>  >> > > > >> >> >>>prior to adoption of Metron someone can set up all the
>>  feeds
>>  >> > and
>>  >> > > > >> kafka
>>  >> > > > >> >> >>>topics, but instead of deploying Metron right away they
>>  >> would
>>  >> > > > deploy
>>  >> > > > >> >> this
>>  >> > > > >> >> >>>tool. This tool would then produce statistics for data
>>  >> > > > >> ingest/storage
>>  >> > > > >> >> >>>requirement, and all relevant information needed for
>>  cluster
>>  >> > > > sizing.
>>  >> > > > >> >> >>>
>>  >> > > > >> >> >>>Some of the metrics that can be recorded are:
>>  >> > > > >> >> >>>
>>  >> > > > >> >> >>> * Number of system events per second (average, max,
>>  >> mean,
>>  >> > > > >> standard
>>  >> > > > >> >> >>>dev)
>>  >> > > > >> >> >>> * Message size (average, max, mean, standard dev)
>>  >> > > > >> >> >>> * Average number of peaks
>>  >> > > > >> >> >>> * Duration of peaks (average, max, mean, standard dev)
>>  >> > > > >> >> >>>
>>  >> > > > >> >> >>>If the parser for a telemetry exist the tool can produce
>>  >> > > > additional
>>  >> > > > >> >> >>>statistics
>>  >> > > > >> >> >>>
>>  >> > > > >> >> >>> * Number of keys/fields parsed (average, max, mean,
>>  >> > standard
>>  >> > > > dev)
>>  >> > > > >> >> >>> * Length of field parsed (average, max, mean, standard
>>  >> > dev)
>>  >> > > > >> >> >>> * Length of key parsed (average, max, mean, standard
>>  >> dev)
>>  >> > > > >> >> >>>
>>  >> > > > >> >> >>>The tool can run for a week or a month and produce these
>>  >> kinds
>>  >> > > of
>>  >> > > > >> >> >>>statistics. Then once the statistics are available we
>>  can
>>  >> > come
>>  >> > > up
>>  >> > > > >> with
>>  >> > > > >> >> a
>>  >> > > > >> >> >>>guidance documentation of recommended cluster setup.
>>  >> > Otherwise
>>  >> > > > it¹s
>>  >> > > > >> >> hard
>>  >> > > > >> >> >>>to properly size a cluster and setup streaming
>>  parallelism
>>  >> not
>>  >> > > > >> knowing
>>  >> > > > >> >> >>>these metrics.
>>  >> > > > >> >> >>>
>>  >> > > > >> >> >>>
>>  >> > > > >> >> >>>Thoughts/ideas?
>>  >> > > > >> >> >>>
>>  >> > > > >> >> >>>Thanks,
>>  >> > > > >> >> >>>James
>>  >> > > > >> >> >>
>>  >> > > > >> >> >>
>>  >> > > > >> >> >
>>  >> > > > >> >>
>>  >> > > > >>
>>  >> > > > >
>>  >> > > > >
>>  >> > > > >
>>  >> > > > >--
>>  >> > > > >Nick Allen <[email protected]>
>>  >> > > >
>>  >> > >
>>  >> > >
>>  >> > >
>>  >> > > --
>>  >> > > Nick Allen <[email protected]>
>>  >> > >
>>  >> > --
>>  >> >
>>  >> > Jon
>>  >> >
>>  >>
>>  >> --
>>  >> Nick Allen <[email protected]>
>>  > --
>>  >
>>  > Jon
>>
>>  -------------------
>>  Thank you,
>>
>>  James Sirota
>>  PPMC- Apache Metron (Incubating)
>>  jsirota AT apache DOT org
> --
>
> Jon

------------------- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Re: [DISCUSS] Metron assessment tool

Reply via email to