Hey John, First of all, thanks for the contributions. Contributions make open source work, so thanks so much for that.
The structure of metron-streaming will likely be shifting. The lay of the land is that the last few months have seen a rearchitecture of a lot of the old opensoc code. As it stands, there's some code that is no longer used and the organization could use some work. As such, expect this structure to shift a bit. This is one of the reasons that there's been less formal documentation than there will be going forward (I promise :). However, let's consider the structure as it stands now (I am going to skip the projects that I do not believe are being actively used). This is just intended to give some color to the good work already done at https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture: - Metron-Pcap_Service - This is the REST service which serves up packet capture data from HBase (at present). The requests come in through the pcap panel in kibana. - Check out org.apache.metron.pcapservice.PcapGetterHBaseImpl to see how this works. - I'd recommend looking at the unit test org.apache.metron.pcapservice.PcapGetterHBaseImplTest - Metron-Topologies - This project mostly, at this point, holds the Storm topologies in the form of Flux yaml files. There are generally two types of topologies, parser topologies and the enrichment topology. - These aim of the sensor specific topologies is to take the raw sensor output and normalize it to some extent. The input is the raw sensor data via kafka and the output is a semi-normalized JSON (there are still sensor specific stuff in there, but we ensure that src, dest ip/port and protocol are all there in predictable fieldnames) to Kafka. - Yaf: metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/yaf - Bro: metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/bro - Snort: metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/snort - The enrichment topology is intended to pull the quasi-normalized JSON out and add enrichments. Enrichments come in two varieties now, threat intelligence and enrichments such as geo tagging - The topology is at metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/enrichment - By *far* the best way to understand what is going on enrichment-wise is to look at the integration test @ metron-streaming/Metron-Topologies/src/test/java/org/apache/metron/integration/EnrichmentIntegrationTest.java. This test spins up in memory instances of storm, kafka and a mock HBase table and runs real data through the topology, ensuring the output is what we would expect. - Due to volume, the pcap data actually skips the enrichment topology and goes directly to HBase. (see metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/pcap) - Metron-EnrichmentAdapters - This is where the actual enrichment adapters live. Also, threat intel adapters live here. This is in the process of a bit of churn. The things to note about the enrichments is that we have moved to a split/join style architecture. More on that can be found at the documentation associated with https://issues.apache.org/jira/browse/METRON-35 - One more thing to note, enrichment adapters take their configuration from zookeeper so that we can adjust them in a running topology without taking the topology down. See ConfiguredBolt and GenericEnrichmentBolt for reasonable examples of how that looks. - Metron-Indexing - This is largely going to get split into two projects for Elasticsearch and Solr, but there is also a HDFS indexing bolt (sending enriched messages to HDFS for future analysis) that might be of interest. Again, the EnrichmentIntegrationTest drives data through these pathways. - Metron-DataLoads - This is a project intended to load data into HBase for use in the enrichment adn threat intel adapters. Right now, in the current RC, this is just for threat intel. - The loaders supported currently are: - Loading CSV files or Stix files via mapreduce into HBase (see ThreatIntelBulkLoader and the associated integration test BulkLoadMapperIntegrationTest) - Loading threat intel data via a Taxii feed (see TaxiiLoader and the associated integration test TaxiiIntegrationTest) - In a PR submitted today by me, this will be generalized to support loading enrichment data into HBase along with an accompanying enrichment adapter which pulls enrichments data from HBase. Also, there will be a flat file loader, so you can point to a CSV file and load enrichment or threat intel data into HBase. - Metron-MessageParsers - You have the right of it below - Metron-Common - Common utilities Anyway, I hope that helps. I'd recommend digging into the tests, especially the EnrichmentIntegrationTest to see how things work. Also, watch out for the structure to shift under your feet for a bit here. Hope this helps! Looking forward to more PRs. :) Casey On Fri, Apr 1, 2016 at 12:38 AM, John <[email protected]> wrote: > Hello Dev@Metron, > > I've been thinking about getting more involved with Metron. I've already > submitted a couple very simple PRs that got approved and one is now merged > into master. The ansible and vagrant scripts have made it super easy to > spend up a 10-node setup in AWS or a local VM setup for testing. So now I'm > diving into the Metron-Streaming modules to try and figure out what roles > each of play. I haven't dug super deep yet, so based on little I've seen, > plus the individual README's -- this is what I've gathered so far at a > high-level... > > - *Metron-Pcap_Service* : Example service that grab packets and stores > them to HBase. > - *Metron-DataServices* : How the messages(/events) get into the > pipeline. > - *Metron-MessageParsers* : Takes raw messages (which can be binary > formats) and converts them to a common format of source/destination > ip/port/protocol w/ timestamp+message. Looks like a couple of the > parsing > patterns forked from Logstash. > - *Metron-EnrichmentAdapters* : As the messages come in, extra metadata > can be added, like geo, whois, etc. So I guess the parsed message + any > enrichment adapters you have enabled would be "the model". > - *Metron-DataLoads* : How to get the enrichment data into the system. > - *Metron-Alerts* : Sends the message onto the message stream like > normal, but will also send it to the alert stream. > - *Metron-Indexing* : This is the main output of the streaming system, > which is currently Elasticsearch/Kibana(v3)… but looks like you're in > the > middle of adding Solr support too. > - *Metron-Topologies* : To configure all this stuff to meet your needs > (ex. which telemetries you want to collect). > - *Metron-Testing* : To test this whole thing without needing servers or > data. > - *Metron-Common* : Dev tools/packages shared across modules. > > Totally not looking for someone to blow a bunch of time on a super detailed > response; just curious if I'm totally off based on any of these modules or > if I missed something super big. > > > Thanks! > John >
