Re: Metron-Streaming Modules...

John Fri, 01 Apr 2016 07:58:44 -0700

Awesome Casey & Ryan! I wasn't aware of the wiki, so I'll definitely be
reading through that. And the extra details you sent Casey will definitely
help too. Thanks!


On Fri, Apr 1, 2016 at 10:54 AM, Casey Stella <[email protected]> wrote:

> Hi Debo,
>
> Thanks!  I'm glad that it's useful.  The issue is that most things are in
> flux at the moment.  That being said, we have some pretty complete
> documentation at
> https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture
> This email was an attempt to connect the code to the architecture to help
> out new contributors.  I definitely think something should be put up around
> this theme on the wiki or the website.
>
> As things move forward, we'll definitely work to keep the documentation
> current.
>
> Casey
>
> On Fri, Apr 1, 2016 at 10:39 AM, Debo Dutta (dedutta) <[email protected]>
> wrote:
>
> > Hi Casey
> >
> > This is a good intro. We should have this on our web pages. On the topic
> > of metron streaming re-arch and re-factor, is there a document that is
> > being worked on?
> >
> > The dev list is quiet :)
> >
> > debo
> >
> >
> >
> >
> > On 4/1/16, 6:08 AM, "Casey Stella" <[email protected]> wrote:
> >
> > >Hey John,
> > >
> > >First of all, thanks for the contributions.  Contributions make open
> > source
> > >work, so thanks so much for that.
> > >
> > >The structure of metron-streaming will likely be shifting.  The lay of
> the
> > >land is that the last few months have seen a rearchitecture of a lot of
> > the
> > >old opensoc code.  As it stands, there's some code that is no longer
> used
> > >and the organization could use some work.  As such, expect this
> structure
> > >to shift a bit.  This is one of the reasons that there's been less
> formal
> > >documentation than there will be going forward (I promise :).
> > >
> > >However, let's consider the structure as it stands now (I am going to
> skip
> > >the projects that I do not believe are being actively used).  This is
> just
> > >intended to give some color to the good work already done at
> > >https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture:
> > >
> > >   - Metron-Pcap_Service
> > >      - This is the REST service which serves up packet capture data
> from
> > >      HBase (at present).  The requests come in through the pcap panel
> > >in kibana.
> > >
> > >      - Check out org.apache.metron.pcapservice.PcapGetterHBaseImpl to
> see
> > >      how this works.
> > >      - I'd recommend looking at the unit test
> > >       org.apache.metron.pcapservice.PcapGetterHBaseImplTest
> > >   - Metron-Topologies
> > >   - This project mostly, at this point, holds the Storm topologies in
> the
> > >      form of Flux yaml files.  There are generally two types of
> > topologies,
> > >      parser topologies and the enrichment topology.
> > >      - These aim of the sensor specific topologies is to take the raw
> > >      sensor output and normalize it to some extent.  The input is the
> > >raw sensor
> > >      data via kafka and the output is a semi-normalized JSON (there are
> > still
> > >      sensor specific stuff in there, but we ensure that src, dest
> > ip/port and
> > >      protocol are all there in predictable fieldnames) to Kafka.
> > >         - Yaf:
> > >
> >
> metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/yaf
> > >         -
> > >         Bro:
> >
> metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/bro
> > >         -
> > >         Snort:
> >
> >
> >metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/snort
> > >      - The enrichment topology is intended to pull the quasi-normalized
> > >      JSON out and add enrichments.  Enrichments come in two varieties
> > now,
> > >      threat intelligence and enrichments such as geo tagging
> > >         - The topology is
> > >         at
> >
> metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/enrichment
> > >         - By *far* the best way to understand what is going on
> > >         enrichment-wise is to look at the integration test
> > >         @
> >
> metron-streaming/Metron-Topologies/src/test/java/org/apache/metron/integration/EnrichmentIntegrationTest.java.
> > >         This test spins up in memory instances of storm, kafka and a
> > >mock HBase
> > >         table and runs real data through the topology, ensuring the
> > >output is what
> > >         we would expect.
> > >      - Due to volume, the pcap data actually skips the enrichment
> > topology
> > >      and goes directly to HBase.
> > >      (see
> >
> metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/pcap)
> > >   - Metron-EnrichmentAdapters
> > >   - This is where the actual enrichment adapters live.  Also, threat
> > intel
> > >      adapters live here.  This is in the process of a bit of churn.
> > >The things
> > >      to note about the enrichments is that we have moved to a
> split/join
> > style
> > >      architecture.  More on that can be found at the documentation
> > associated
> > >      with https://issues.apache.org/jira/browse/METRON-35
> > >      - One more thing to note, enrichment adapters take their
> > >      configuration from zookeeper so that we can adjust them in a
> running
> > >      topology without taking the topology down.  See ConfiguredBolt and
> > >      GenericEnrichmentBolt for reasonable examples of how that looks.
> > >   - Metron-Indexing
> > >   - This is largely going to get split into two projects for
> > Elasticsearch
> > >      and Solr, but there is also a HDFS indexing bolt (sending
> > >enriched messages
> > >      to HDFS for future analysis) that might be of interest.  Again,
> the
> > >      EnrichmentIntegrationTest drives data through these pathways.
> > >   - Metron-DataLoads
> > >      - This is a project intended to load data into HBase for use in
> the
> > >      enrichment adn threat intel adapters.  Right now, in the current
> > RC, this
> > >      is just for threat intel.
> > >      - The loaders supported currently are:
> > >         - Loading CSV files or Stix files via mapreduce into HBase (see
> > >         ThreatIntelBulkLoader and the associated integration test
> > >         BulkLoadMapperIntegrationTest)
> > >         - Loading threat intel data via a Taxii feed (see TaxiiLoader
> and
> > >         the associated integration test TaxiiIntegrationTest)
> > >      - In a PR submitted today by me, this will be generalized to
> support
> > >      loading enrichment data into HBase along with an accompanying
> > enrichment
> > >      adapter which pulls enrichments data from HBase.  Also, there will
> > be a
> > >      flat file loader, so you can point to a CSV file and load
> > enrichment or
> > >      threat intel data into HBase.
> > >   - Metron-MessageParsers
> > >   - You have the right of it below
> > >   - Metron-Common
> > >      - Common utilities
> > >
> > >Anyway, I hope that helps.  I'd recommend digging into the tests,
> > >especially the EnrichmentIntegrationTest to see how things work.  Also,
> > >watch out for the structure to shift under your feet for a bit here.
> > >
> > >Hope this helps!
> > >
> > >Looking forward to more PRs. :)
> > >
> > >Casey
> > >
> > >On Fri, Apr 1, 2016 at 12:38 AM, John <[email protected]> wrote:
> > >
> > >> Hello Dev@Metron,
> > >>
> > >> I've been thinking about getting more involved with Metron. I've
> already
> > >> submitted a couple very simple PRs that got approved and one is now
> > merged
> > >> into master. The ansible and vagrant scripts have made it super easy
> to
> > >> spend up a 10-node setup in AWS or a local VM setup for testing. So
> now
> > I'm
> > >> diving into the Metron-Streaming modules to try and figure out what
> > roles
> > >> each of play. I haven't dug super deep yet, so based on little I've
> > seen,
> > >> plus the individual README's -- this is what I've gathered so far at a
> > >> high-level...
> > >>
> > >>    - *Metron-Pcap_Service* : Example service that grab packets and
> > stores
> > >>    them to HBase.
> > >>    - *Metron-DataServices* : How the messages(/events) get into the
> > >>    pipeline.
> > >>    - *Metron-MessageParsers* : Takes raw messages (which can be binary
> > >>    formats) and converts them to a common format of source/destination
> > >>    ip/port/protocol w/ timestamp+message. Looks like a couple of the
> > >> parsing
> > >>    patterns forked from Logstash.
> > >>    - *Metron-EnrichmentAdapters* : As the messages come in, extra
> > metadata
> > >>    can be added, like geo, whois, etc. So I guess the parsed message +
> > any
> > >>    enrichment adapters you have enabled would be "the model".
> > >>    - *Metron-DataLoads* : How to get the enrichment data into the
> > system.
> > >>    - *Metron-Alerts* : Sends the message onto the message stream like
> > >>    normal, but will also send it to the alert stream.
> > >>    - *Metron-Indexing* : This is the main output of the streaming
> > system,
> > >>    which is currently Elasticsearch/Kibana(v3)… but looks like you're
> in
> > >> the
> > >>    middle of adding Solr support too.
> > >>    - *Metron-Topologies* : To configure all this stuff to meet your
> > needs
> > >>    (ex. which telemetries you want to collect).
> > >>    - *Metron-Testing* : To test this whole thing without needing
> > servers or
> > >>    data.
> > >>    - *Metron-Common* : Dev tools/packages shared across modules.
> > >>
> > >> Totally not looking for someone to blow a bunch of time on a super
> > detailed
> > >> response; just curious if I'm totally off based on any of these
> modules
> > or
> > >> if I missed something super big.
> > >>
> > >>
> > >> Thanks!
> > >> John
> > >>
> >
>

Re: Metron-Streaming Modules...

Reply via email to