Hi Debo, Thanks! I'm glad that it's useful. The issue is that most things are in flux at the moment. That being said, we have some pretty complete documentation at https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture This email was an attempt to connect the code to the architecture to help out new contributors. I definitely think something should be put up around this theme on the wiki or the website.
As things move forward, we'll definitely work to keep the documentation current. Casey On Fri, Apr 1, 2016 at 10:39 AM, Debo Dutta (dedutta) <[email protected]> wrote: > Hi Casey > > This is a good intro. We should have this on our web pages. On the topic > of metron streaming re-arch and re-factor, is there a document that is > being worked on? > > The dev list is quiet :) > > debo > > > > > On 4/1/16, 6:08 AM, "Casey Stella" <[email protected]> wrote: > > >Hey John, > > > >First of all, thanks for the contributions. Contributions make open > source > >work, so thanks so much for that. > > > >The structure of metron-streaming will likely be shifting. The lay of the > >land is that the last few months have seen a rearchitecture of a lot of > the > >old opensoc code. As it stands, there's some code that is no longer used > >and the organization could use some work. As such, expect this structure > >to shift a bit. This is one of the reasons that there's been less formal > >documentation than there will be going forward (I promise :). > > > >However, let's consider the structure as it stands now (I am going to skip > >the projects that I do not believe are being actively used). This is just > >intended to give some color to the good work already done at > >https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture: > > > > - Metron-Pcap_Service > > - This is the REST service which serves up packet capture data from > > HBase (at present). The requests come in through the pcap panel > >in kibana. > > > > - Check out org.apache.metron.pcapservice.PcapGetterHBaseImpl to see > > how this works. > > - I'd recommend looking at the unit test > > org.apache.metron.pcapservice.PcapGetterHBaseImplTest > > - Metron-Topologies > > - This project mostly, at this point, holds the Storm topologies in the > > form of Flux yaml files. There are generally two types of > topologies, > > parser topologies and the enrichment topology. > > - These aim of the sensor specific topologies is to take the raw > > sensor output and normalize it to some extent. The input is the > >raw sensor > > data via kafka and the output is a semi-normalized JSON (there are > still > > sensor specific stuff in there, but we ensure that src, dest > ip/port and > > protocol are all there in predictable fieldnames) to Kafka. > > - Yaf: > > > > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/yaf > > - > > Bro: > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/bro > > - > > Snort: > > >metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/snort > > - The enrichment topology is intended to pull the quasi-normalized > > JSON out and add enrichments. Enrichments come in two varieties > now, > > threat intelligence and enrichments such as geo tagging > > - The topology is > > at > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/enrichment > > - By *far* the best way to understand what is going on > > enrichment-wise is to look at the integration test > > @ > metron-streaming/Metron-Topologies/src/test/java/org/apache/metron/integration/EnrichmentIntegrationTest.java. > > This test spins up in memory instances of storm, kafka and a > >mock HBase > > table and runs real data through the topology, ensuring the > >output is what > > we would expect. > > - Due to volume, the pcap data actually skips the enrichment > topology > > and goes directly to HBase. > > (see > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/pcap) > > - Metron-EnrichmentAdapters > > - This is where the actual enrichment adapters live. Also, threat > intel > > adapters live here. This is in the process of a bit of churn. > >The things > > to note about the enrichments is that we have moved to a split/join > style > > architecture. More on that can be found at the documentation > associated > > with https://issues.apache.org/jira/browse/METRON-35 > > - One more thing to note, enrichment adapters take their > > configuration from zookeeper so that we can adjust them in a running > > topology without taking the topology down. See ConfiguredBolt and > > GenericEnrichmentBolt for reasonable examples of how that looks. > > - Metron-Indexing > > - This is largely going to get split into two projects for > Elasticsearch > > and Solr, but there is also a HDFS indexing bolt (sending > >enriched messages > > to HDFS for future analysis) that might be of interest. Again, the > > EnrichmentIntegrationTest drives data through these pathways. > > - Metron-DataLoads > > - This is a project intended to load data into HBase for use in the > > enrichment adn threat intel adapters. Right now, in the current > RC, this > > is just for threat intel. > > - The loaders supported currently are: > > - Loading CSV files or Stix files via mapreduce into HBase (see > > ThreatIntelBulkLoader and the associated integration test > > BulkLoadMapperIntegrationTest) > > - Loading threat intel data via a Taxii feed (see TaxiiLoader and > > the associated integration test TaxiiIntegrationTest) > > - In a PR submitted today by me, this will be generalized to support > > loading enrichment data into HBase along with an accompanying > enrichment > > adapter which pulls enrichments data from HBase. Also, there will > be a > > flat file loader, so you can point to a CSV file and load > enrichment or > > threat intel data into HBase. > > - Metron-MessageParsers > > - You have the right of it below > > - Metron-Common > > - Common utilities > > > >Anyway, I hope that helps. I'd recommend digging into the tests, > >especially the EnrichmentIntegrationTest to see how things work. Also, > >watch out for the structure to shift under your feet for a bit here. > > > >Hope this helps! > > > >Looking forward to more PRs. :) > > > >Casey > > > >On Fri, Apr 1, 2016 at 12:38 AM, John <[email protected]> wrote: > > > >> Hello Dev@Metron, > >> > >> I've been thinking about getting more involved with Metron. I've already > >> submitted a couple very simple PRs that got approved and one is now > merged > >> into master. The ansible and vagrant scripts have made it super easy to > >> spend up a 10-node setup in AWS or a local VM setup for testing. So now > I'm > >> diving into the Metron-Streaming modules to try and figure out what > roles > >> each of play. I haven't dug super deep yet, so based on little I've > seen, > >> plus the individual README's -- this is what I've gathered so far at a > >> high-level... > >> > >> - *Metron-Pcap_Service* : Example service that grab packets and > stores > >> them to HBase. > >> - *Metron-DataServices* : How the messages(/events) get into the > >> pipeline. > >> - *Metron-MessageParsers* : Takes raw messages (which can be binary > >> formats) and converts them to a common format of source/destination > >> ip/port/protocol w/ timestamp+message. Looks like a couple of the > >> parsing > >> patterns forked from Logstash. > >> - *Metron-EnrichmentAdapters* : As the messages come in, extra > metadata > >> can be added, like geo, whois, etc. So I guess the parsed message + > any > >> enrichment adapters you have enabled would be "the model". > >> - *Metron-DataLoads* : How to get the enrichment data into the > system. > >> - *Metron-Alerts* : Sends the message onto the message stream like > >> normal, but will also send it to the alert stream. > >> - *Metron-Indexing* : This is the main output of the streaming > system, > >> which is currently Elasticsearch/Kibana(v3)… but looks like you're in > >> the > >> middle of adding Solr support too. > >> - *Metron-Topologies* : To configure all this stuff to meet your > needs > >> (ex. which telemetries you want to collect). > >> - *Metron-Testing* : To test this whole thing without needing > servers or > >> data. > >> - *Metron-Common* : Dev tools/packages shared across modules. > >> > >> Totally not looking for someone to blow a bunch of time on a super > detailed > >> response; just curious if I'm totally off based on any of these modules > or > >> if I missed something super big. > >> > >> > >> Thanks! > >> John > >> >
