Awesome Casey & Ryan! I wasn't aware of the wiki, so I'll definitely be reading through that. And the extra details you sent Casey will definitely help too. Thanks!
On Fri, Apr 1, 2016 at 10:54 AM, Casey Stella <[email protected]> wrote: > Hi Debo, > > Thanks! I'm glad that it's useful. The issue is that most things are in > flux at the moment. That being said, we have some pretty complete > documentation at > https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture > This email was an attempt to connect the code to the architecture to help > out new contributors. I definitely think something should be put up around > this theme on the wiki or the website. > > As things move forward, we'll definitely work to keep the documentation > current. > > Casey > > On Fri, Apr 1, 2016 at 10:39 AM, Debo Dutta (dedutta) <[email protected]> > wrote: > > > Hi Casey > > > > This is a good intro. We should have this on our web pages. On the topic > > of metron streaming re-arch and re-factor, is there a document that is > > being worked on? > > > > The dev list is quiet :) > > > > debo > > > > > > > > > > On 4/1/16, 6:08 AM, "Casey Stella" <[email protected]> wrote: > > > > >Hey John, > > > > > >First of all, thanks for the contributions. Contributions make open > > source > > >work, so thanks so much for that. > > > > > >The structure of metron-streaming will likely be shifting. The lay of > the > > >land is that the last few months have seen a rearchitecture of a lot of > > the > > >old opensoc code. As it stands, there's some code that is no longer > used > > >and the organization could use some work. As such, expect this > structure > > >to shift a bit. This is one of the reasons that there's been less > formal > > >documentation than there will be going forward (I promise :). > > > > > >However, let's consider the structure as it stands now (I am going to > skip > > >the projects that I do not believe are being actively used). This is > just > > >intended to give some color to the good work already done at > > >https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture: > > > > > > - Metron-Pcap_Service > > > - This is the REST service which serves up packet capture data > from > > > HBase (at present). The requests come in through the pcap panel > > >in kibana. > > > > > > - Check out org.apache.metron.pcapservice.PcapGetterHBaseImpl to > see > > > how this works. > > > - I'd recommend looking at the unit test > > > org.apache.metron.pcapservice.PcapGetterHBaseImplTest > > > - Metron-Topologies > > > - This project mostly, at this point, holds the Storm topologies in > the > > > form of Flux yaml files. There are generally two types of > > topologies, > > > parser topologies and the enrichment topology. > > > - These aim of the sensor specific topologies is to take the raw > > > sensor output and normalize it to some extent. The input is the > > >raw sensor > > > data via kafka and the output is a semi-normalized JSON (there are > > still > > > sensor specific stuff in there, but we ensure that src, dest > > ip/port and > > > protocol are all there in predictable fieldnames) to Kafka. > > > - Yaf: > > > > > > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/yaf > > > - > > > Bro: > > > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/bro > > > - > > > Snort: > > > > > >metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/snort > > > - The enrichment topology is intended to pull the quasi-normalized > > > JSON out and add enrichments. Enrichments come in two varieties > > now, > > > threat intelligence and enrichments such as geo tagging > > > - The topology is > > > at > > > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/enrichment > > > - By *far* the best way to understand what is going on > > > enrichment-wise is to look at the integration test > > > @ > > > metron-streaming/Metron-Topologies/src/test/java/org/apache/metron/integration/EnrichmentIntegrationTest.java. > > > This test spins up in memory instances of storm, kafka and a > > >mock HBase > > > table and runs real data through the topology, ensuring the > > >output is what > > > we would expect. > > > - Due to volume, the pcap data actually skips the enrichment > > topology > > > and goes directly to HBase. > > > (see > > > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/pcap) > > > - Metron-EnrichmentAdapters > > > - This is where the actual enrichment adapters live. Also, threat > > intel > > > adapters live here. This is in the process of a bit of churn. > > >The things > > > to note about the enrichments is that we have moved to a > split/join > > style > > > architecture. More on that can be found at the documentation > > associated > > > with https://issues.apache.org/jira/browse/METRON-35 > > > - One more thing to note, enrichment adapters take their > > > configuration from zookeeper so that we can adjust them in a > running > > > topology without taking the topology down. See ConfiguredBolt and > > > GenericEnrichmentBolt for reasonable examples of how that looks. > > > - Metron-Indexing > > > - This is largely going to get split into two projects for > > Elasticsearch > > > and Solr, but there is also a HDFS indexing bolt (sending > > >enriched messages > > > to HDFS for future analysis) that might be of interest. Again, > the > > > EnrichmentIntegrationTest drives data through these pathways. > > > - Metron-DataLoads > > > - This is a project intended to load data into HBase for use in > the > > > enrichment adn threat intel adapters. Right now, in the current > > RC, this > > > is just for threat intel. > > > - The loaders supported currently are: > > > - Loading CSV files or Stix files via mapreduce into HBase (see > > > ThreatIntelBulkLoader and the associated integration test > > > BulkLoadMapperIntegrationTest) > > > - Loading threat intel data via a Taxii feed (see TaxiiLoader > and > > > the associated integration test TaxiiIntegrationTest) > > > - In a PR submitted today by me, this will be generalized to > support > > > loading enrichment data into HBase along with an accompanying > > enrichment > > > adapter which pulls enrichments data from HBase. Also, there will > > be a > > > flat file loader, so you can point to a CSV file and load > > enrichment or > > > threat intel data into HBase. > > > - Metron-MessageParsers > > > - You have the right of it below > > > - Metron-Common > > > - Common utilities > > > > > >Anyway, I hope that helps. I'd recommend digging into the tests, > > >especially the EnrichmentIntegrationTest to see how things work. Also, > > >watch out for the structure to shift under your feet for a bit here. > > > > > >Hope this helps! > > > > > >Looking forward to more PRs. :) > > > > > >Casey > > > > > >On Fri, Apr 1, 2016 at 12:38 AM, John <[email protected]> wrote: > > > > > >> Hello Dev@Metron, > > >> > > >> I've been thinking about getting more involved with Metron. I've > already > > >> submitted a couple very simple PRs that got approved and one is now > > merged > > >> into master. The ansible and vagrant scripts have made it super easy > to > > >> spend up a 10-node setup in AWS or a local VM setup for testing. So > now > > I'm > > >> diving into the Metron-Streaming modules to try and figure out what > > roles > > >> each of play. I haven't dug super deep yet, so based on little I've > > seen, > > >> plus the individual README's -- this is what I've gathered so far at a > > >> high-level... > > >> > > >> - *Metron-Pcap_Service* : Example service that grab packets and > > stores > > >> them to HBase. > > >> - *Metron-DataServices* : How the messages(/events) get into the > > >> pipeline. > > >> - *Metron-MessageParsers* : Takes raw messages (which can be binary > > >> formats) and converts them to a common format of source/destination > > >> ip/port/protocol w/ timestamp+message. Looks like a couple of the > > >> parsing > > >> patterns forked from Logstash. > > >> - *Metron-EnrichmentAdapters* : As the messages come in, extra > > metadata > > >> can be added, like geo, whois, etc. So I guess the parsed message + > > any > > >> enrichment adapters you have enabled would be "the model". > > >> - *Metron-DataLoads* : How to get the enrichment data into the > > system. > > >> - *Metron-Alerts* : Sends the message onto the message stream like > > >> normal, but will also send it to the alert stream. > > >> - *Metron-Indexing* : This is the main output of the streaming > > system, > > >> which is currently Elasticsearch/Kibana(v3)… but looks like you're > in > > >> the > > >> middle of adding Solr support too. > > >> - *Metron-Topologies* : To configure all this stuff to meet your > > needs > > >> (ex. which telemetries you want to collect). > > >> - *Metron-Testing* : To test this whole thing without needing > > servers or > > >> data. > > >> - *Metron-Common* : Dev tools/packages shared across modules. > > >> > > >> Totally not looking for someone to blow a bunch of time on a super > > detailed > > >> response; just curious if I'm totally off based on any of these > modules > > or > > >> if I missed something super big. > > >> > > >> > > >> Thanks! > > >> John > > >> > > >
