Re: Metron-Streaming Modules...

Debo Dutta (dedutta) Fri, 01 Apr 2016 07:40:20 -0700

Hi Casey

This is a good intro. We should have this on our web pages. On the topic of 
metron streaming re-arch and re-factor, is there a document that is being 
worked on?


The dev list is quiet :)

debo




On 4/1/16, 6:08 AM, "Casey Stella" <[email protected]> wrote:

>Hey John,
>
>First of all, thanks for the contributions.  Contributions make open source
>work, so thanks so much for that.
>
>The structure of metron-streaming will likely be shifting.  The lay of the
>land is that the last few months have seen a rearchitecture of a lot of the
>old opensoc code.  As it stands, there's some code that is no longer used
>and the organization could use some work.  As such, expect this structure
>to shift a bit.  This is one of the reasons that there's been less formal
>documentation than there will be going forward (I promise :).
>
>However, let's consider the structure as it stands now (I am going to skip
>the projects that I do not believe are being actively used).  This is just
>intended to give some color to the good work already done at
>https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture:
>
>   - Metron-Pcap_Service
>      - This is the REST service which serves up packet capture data from
>      HBase (at present).  The requests come in through the pcap panel
>in kibana.
>
>      - Check out org.apache.metron.pcapservice.PcapGetterHBaseImpl to see
>      how this works.
>      - I'd recommend looking at the unit test
>       org.apache.metron.pcapservice.PcapGetterHBaseImplTest
>   - Metron-Topologies
>   - This project mostly, at this point, holds the Storm topologies in the
>      form of Flux yaml files.  There are generally two types of topologies,
>      parser topologies and the enrichment topology.
>      - These aim of the sensor specific topologies is to take the raw
>      sensor output and normalize it to some extent.  The input is the
>raw sensor
>      data via kafka and the output is a semi-normalized JSON (there are still
>      sensor specific stuff in there, but we ensure that src, dest ip/port and
>      protocol are all there in predictable fieldnames) to Kafka.
>         - Yaf:
>         
> metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/yaf
>         -
>         Bro: 
> metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/bro
>         -
>         Snort:
>metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/snort
>      - The enrichment topology is intended to pull the quasi-normalized
>      JSON out and add enrichments.  Enrichments come in two varieties now,
>      threat intelligence and enrichments such as geo tagging
>         - The topology is
>         at 
> metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/enrichment
>         - By *far* the best way to understand what is going on
>         enrichment-wise is to look at the integration test
>         @ 
> metron-streaming/Metron-Topologies/src/test/java/org/apache/metron/integration/EnrichmentIntegrationTest.java.
>         This test spins up in memory instances of storm, kafka and a
>mock HBase
>         table and runs real data through the topology, ensuring the
>output is what
>         we would expect.
>      - Due to volume, the pcap data actually skips the enrichment topology
>      and goes directly to HBase.
>      (see 
> metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/pcap)
>   - Metron-EnrichmentAdapters
>   - This is where the actual enrichment adapters live.  Also, threat intel
>      adapters live here.  This is in the process of a bit of churn.
>The things
>      to note about the enrichments is that we have moved to a split/join style
>      architecture.  More on that can be found at the documentation associated
>      with https://issues.apache.org/jira/browse/METRON-35
>      - One more thing to note, enrichment adapters take their
>      configuration from zookeeper so that we can adjust them in a running
>      topology without taking the topology down.  See ConfiguredBolt and
>      GenericEnrichmentBolt for reasonable examples of how that looks.
>   - Metron-Indexing
>   - This is largely going to get split into two projects for Elasticsearch
>      and Solr, but there is also a HDFS indexing bolt (sending
>enriched messages
>      to HDFS for future analysis) that might be of interest.  Again, the
>      EnrichmentIntegrationTest drives data through these pathways.
>   - Metron-DataLoads
>      - This is a project intended to load data into HBase for use in the
>      enrichment adn threat intel adapters.  Right now, in the current RC, this
>      is just for threat intel.
>      - The loaders supported currently are:
>         - Loading CSV files or Stix files via mapreduce into HBase (see
>         ThreatIntelBulkLoader and the associated integration test
>         BulkLoadMapperIntegrationTest)
>         - Loading threat intel data via a Taxii feed (see TaxiiLoader and
>         the associated integration test TaxiiIntegrationTest)
>      - In a PR submitted today by me, this will be generalized to support
>      loading enrichment data into HBase along with an accompanying enrichment
>      adapter which pulls enrichments data from HBase.  Also, there will be a
>      flat file loader, so you can point to a CSV file and load enrichment or
>      threat intel data into HBase.
>   - Metron-MessageParsers
>   - You have the right of it below
>   - Metron-Common
>      - Common utilities
>
>Anyway, I hope that helps.  I'd recommend digging into the tests,
>especially the EnrichmentIntegrationTest to see how things work.  Also,
>watch out for the structure to shift under your feet for a bit here.
>
>Hope this helps!
>
>Looking forward to more PRs. :)
>
>Casey
>
>On Fri, Apr 1, 2016 at 12:38 AM, John <[email protected]> wrote:
>
>> Hello Dev@Metron,
>>
>> I've been thinking about getting more involved with Metron. I've already
>> submitted a couple very simple PRs that got approved and one is now merged
>> into master. The ansible and vagrant scripts have made it super easy to
>> spend up a 10-node setup in AWS or a local VM setup for testing. So now I'm
>> diving into the Metron-Streaming modules to try and figure out what roles
>> each of play. I haven't dug super deep yet, so based on little I've seen,
>> plus the individual README's -- this is what I've gathered so far at a
>> high-level...
>>
>>    - *Metron-Pcap_Service* : Example service that grab packets and stores
>>    them to HBase.
>>    - *Metron-DataServices* : How the messages(/events) get into the
>>    pipeline.
>>    - *Metron-MessageParsers* : Takes raw messages (which can be binary
>>    formats) and converts them to a common format of source/destination
>>    ip/port/protocol w/ timestamp+message. Looks like a couple of the
>> parsing
>>    patterns forked from Logstash.
>>    - *Metron-EnrichmentAdapters* : As the messages come in, extra metadata
>>    can be added, like geo, whois, etc. So I guess the parsed message + any
>>    enrichment adapters you have enabled would be "the model".
>>    - *Metron-DataLoads* : How to get the enrichment data into the system.
>>    - *Metron-Alerts* : Sends the message onto the message stream like
>>    normal, but will also send it to the alert stream.
>>    - *Metron-Indexing* : This is the main output of the streaming system,
>>    which is currently Elasticsearch/Kibana(v3)… but looks like you're in
>> the
>>    middle of adding Solr support too.
>>    - *Metron-Topologies* : To configure all this stuff to meet your needs
>>    (ex. which telemetries you want to collect).
>>    - *Metron-Testing* : To test this whole thing without needing servers or
>>    data.
>>    - *Metron-Common* : Dev tools/packages shared across modules.
>>
>> Totally not looking for someone to blow a bunch of time on a super detailed
>> response; just curious if I'm totally off based on any of these modules or
>> if I missed something super big.
>>
>>
>> Thanks!
>> John
>>

Re: Metron-Streaming Modules...

Reply via email to