Hi Casey 

Thanks. 

What I meant was this - are we discussing the changes in arch and the
re-factoring plans somewhere in the open? Is there any UI design work
happening in the open?

debo

On 4/1/16, 7:54 AM, "Casey Stella" <[email protected]> wrote:

>Hi Debo,
>
>Thanks!  I'm glad that it's useful.  The issue is that most things are in
>flux at the moment.  That being said, we have some pretty complete
>documentation at
>https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture
>This email was an attempt to connect the code to the architecture to help
>out new contributors.  I definitely think something should be put up
>around
>this theme on the wiki or the website.
>
>As things move forward, we'll definitely work to keep the documentation
>current.
>
>Casey
>
>On Fri, Apr 1, 2016 at 10:39 AM, Debo Dutta (dedutta) <[email protected]>
>wrote:
>
>> Hi Casey
>>
>> This is a good intro. We should have this on our web pages. On the topic
>> of metron streaming re-arch and re-factor, is there a document that is
>> being worked on?
>>
>> The dev list is quiet :)
>>
>> debo
>>
>>
>>
>>
>> On 4/1/16, 6:08 AM, "Casey Stella" <[email protected]> wrote:
>>
>> >Hey John,
>> >
>> >First of all, thanks for the contributions.  Contributions make open
>> source
>> >work, so thanks so much for that.
>> >
>> >The structure of metron-streaming will likely be shifting.  The lay of
>>the
>> >land is that the last few months have seen a rearchitecture of a lot of
>> the
>> >old opensoc code.  As it stands, there's some code that is no longer
>>used
>> >and the organization could use some work.  As such, expect this
>>structure
>> >to shift a bit.  This is one of the reasons that there's been less
>>formal
>> >documentation than there will be going forward (I promise :).
>> >
>> >However, let's consider the structure as it stands now (I am going to
>>skip
>> >the projects that I do not believe are being actively used).  This is
>>just
>> >intended to give some color to the good work already done at
>> >https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture:
>> >
>> >   - Metron-Pcap_Service
>> >      - This is the REST service which serves up packet capture data
>>from
>> >      HBase (at present).  The requests come in through the pcap panel
>> >in kibana.
>> >
>> >      - Check out org.apache.metron.pcapservice.PcapGetterHBaseImpl to
>>see
>> >      how this works.
>> >      - I'd recommend looking at the unit test
>> >       org.apache.metron.pcapservice.PcapGetterHBaseImplTest
>> >   - Metron-Topologies
>> >   - This project mostly, at this point, holds the Storm topologies in
>>the
>> >      form of Flux yaml files.  There are generally two types of
>> topologies,
>> >      parser topologies and the enrichment topology.
>> >      - These aim of the sensor specific topologies is to take the raw
>> >      sensor output and normalize it to some extent.  The input is the
>> >raw sensor
>> >      data via kafka and the output is a semi-normalized JSON (there
>>are
>> still
>> >      sensor specific stuff in there, but we ensure that src, dest
>> ip/port and
>> >      protocol are all there in predictable fieldnames) to Kafka.
>> >         - Yaf:
>> >
>>  
>>metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topo
>>logies/yaf
>> >         -
>> >         Bro:
>> 
>>metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topo
>>logies/bro
>> >         -
>> >         Snort:
>>
>> 
>>>metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/top
>>>ologies/snort
>> >      - The enrichment topology is intended to pull the
>>quasi-normalized
>> >      JSON out and add enrichments.  Enrichments come in two varieties
>> now,
>> >      threat intelligence and enrichments such as geo tagging
>> >         - The topology is
>> >         at
>> 
>>metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topo
>>logies/enrichment
>> >         - By *far* the best way to understand what is going on
>> >         enrichment-wise is to look at the integration test
>> >         @
>> 
>>metron-streaming/Metron-Topologies/src/test/java/org/apache/metron/integr
>>ation/EnrichmentIntegrationTest.java.
>> >         This test spins up in memory instances of storm, kafka and a
>> >mock HBase
>> >         table and runs real data through the topology, ensuring the
>> >output is what
>> >         we would expect.
>> >      - Due to volume, the pcap data actually skips the enrichment
>> topology
>> >      and goes directly to HBase.
>> >      (see
>> 
>>metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topo
>>logies/pcap)
>> >   - Metron-EnrichmentAdapters
>> >   - This is where the actual enrichment adapters live.  Also, threat
>> intel
>> >      adapters live here.  This is in the process of a bit of churn.
>> >The things
>> >      to note about the enrichments is that we have moved to a
>>split/join
>> style
>> >      architecture.  More on that can be found at the documentation
>> associated
>> >      with https://issues.apache.org/jira/browse/METRON-35
>> >      - One more thing to note, enrichment adapters take their
>> >      configuration from zookeeper so that we can adjust them in a
>>running
>> >      topology without taking the topology down.  See ConfiguredBolt
>>and
>> >      GenericEnrichmentBolt for reasonable examples of how that looks.
>> >   - Metron-Indexing
>> >   - This is largely going to get split into two projects for
>> Elasticsearch
>> >      and Solr, but there is also a HDFS indexing bolt (sending
>> >enriched messages
>> >      to HDFS for future analysis) that might be of interest.  Again,
>>the
>> >      EnrichmentIntegrationTest drives data through these pathways.
>> >   - Metron-DataLoads
>> >      - This is a project intended to load data into HBase for use in
>>the
>> >      enrichment adn threat intel adapters.  Right now, in the current
>> RC, this
>> >      is just for threat intel.
>> >      - The loaders supported currently are:
>> >         - Loading CSV files or Stix files via mapreduce into HBase
>>(see
>> >         ThreatIntelBulkLoader and the associated integration test
>> >         BulkLoadMapperIntegrationTest)
>> >         - Loading threat intel data via a Taxii feed (see TaxiiLoader
>>and
>> >         the associated integration test TaxiiIntegrationTest)
>> >      - In a PR submitted today by me, this will be generalized to
>>support
>> >      loading enrichment data into HBase along with an accompanying
>> enrichment
>> >      adapter which pulls enrichments data from HBase.  Also, there
>>will
>> be a
>> >      flat file loader, so you can point to a CSV file and load
>> enrichment or
>> >      threat intel data into HBase.
>> >   - Metron-MessageParsers
>> >   - You have the right of it below
>> >   - Metron-Common
>> >      - Common utilities
>> >
>> >Anyway, I hope that helps.  I'd recommend digging into the tests,
>> >especially the EnrichmentIntegrationTest to see how things work.  Also,
>> >watch out for the structure to shift under your feet for a bit here.
>> >
>> >Hope this helps!
>> >
>> >Looking forward to more PRs. :)
>> >
>> >Casey
>> >
>> >On Fri, Apr 1, 2016 at 12:38 AM, John <[email protected]> wrote:
>> >
>> >> Hello Dev@Metron,
>> >>
>> >> I've been thinking about getting more involved with Metron. I've
>>already
>> >> submitted a couple very simple PRs that got approved and one is now
>> merged
>> >> into master. The ansible and vagrant scripts have made it super easy
>>to
>> >> spend up a 10-node setup in AWS or a local VM setup for testing. So
>>now
>> I'm
>> >> diving into the Metron-Streaming modules to try and figure out what
>> roles
>> >> each of play. I haven't dug super deep yet, so based on little I've
>> seen,
>> >> plus the individual README's -- this is what I've gathered so far at
>>a
>> >> high-level...
>> >>
>> >>    - *Metron-Pcap_Service* : Example service that grab packets and
>> stores
>> >>    them to HBase.
>> >>    - *Metron-DataServices* : How the messages(/events) get into the
>> >>    pipeline.
>> >>    - *Metron-MessageParsers* : Takes raw messages (which can be
>>binary
>> >>    formats) and converts them to a common format of
>>source/destination
>> >>    ip/port/protocol w/ timestamp+message. Looks like a couple of the
>> >> parsing
>> >>    patterns forked from Logstash.
>> >>    - *Metron-EnrichmentAdapters* : As the messages come in, extra
>> metadata
>> >>    can be added, like geo, whois, etc. So I guess the parsed message
>>+
>> any
>> >>    enrichment adapters you have enabled would be "the model".
>> >>    - *Metron-DataLoads* : How to get the enrichment data into the
>> system.
>> >>    - *Metron-Alerts* : Sends the message onto the message stream like
>> >>    normal, but will also send it to the alert stream.
>> >>    - *Metron-Indexing* : This is the main output of the streaming
>> system,
>> >>    which is currently Elasticsearch/Kibana(v3)Š but looks like
>>you're in
>> >> the
>> >>    middle of adding Solr support too.
>> >>    - *Metron-Topologies* : To configure all this stuff to meet your
>> needs
>> >>    (ex. which telemetries you want to collect).
>> >>    - *Metron-Testing* : To test this whole thing without needing
>> servers or
>> >>    data.
>> >>    - *Metron-Common* : Dev tools/packages shared across modules.
>> >>
>> >> Totally not looking for someone to blow a bunch of time on a super
>> detailed
>> >> response; just curious if I'm totally off based on any of these
>>modules
>> or
>> >> if I missed something super big.
>> >>
>> >>
>> >> Thanks!
>> >> John
>> >>
>>

Reply via email to