Hi Casey This is a good intro. We should have this on our web pages. On the topic of metron streaming re-arch and re-factor, is there a document that is being worked on?
The dev list is quiet :) debo On 4/1/16, 6:08 AM, "Casey Stella" <[email protected]> wrote: >Hey John, > >First of all, thanks for the contributions. Contributions make open source >work, so thanks so much for that. > >The structure of metron-streaming will likely be shifting. The lay of the >land is that the last few months have seen a rearchitecture of a lot of the >old opensoc code. As it stands, there's some code that is no longer used >and the organization could use some work. As such, expect this structure >to shift a bit. This is one of the reasons that there's been less formal >documentation than there will be going forward (I promise :). > >However, let's consider the structure as it stands now (I am going to skip >the projects that I do not believe are being actively used). This is just >intended to give some color to the good work already done at >https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture: > > - Metron-Pcap_Service > - This is the REST service which serves up packet capture data from > HBase (at present). The requests come in through the pcap panel >in kibana. > > - Check out org.apache.metron.pcapservice.PcapGetterHBaseImpl to see > how this works. > - I'd recommend looking at the unit test > org.apache.metron.pcapservice.PcapGetterHBaseImplTest > - Metron-Topologies > - This project mostly, at this point, holds the Storm topologies in the > form of Flux yaml files. There are generally two types of topologies, > parser topologies and the enrichment topology. > - These aim of the sensor specific topologies is to take the raw > sensor output and normalize it to some extent. The input is the >raw sensor > data via kafka and the output is a semi-normalized JSON (there are still > sensor specific stuff in there, but we ensure that src, dest ip/port and > protocol are all there in predictable fieldnames) to Kafka. > - Yaf: > > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/yaf > - > Bro: > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/bro > - > Snort: >metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/snort > - The enrichment topology is intended to pull the quasi-normalized > JSON out and add enrichments. Enrichments come in two varieties now, > threat intelligence and enrichments such as geo tagging > - The topology is > at > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/enrichment > - By *far* the best way to understand what is going on > enrichment-wise is to look at the integration test > @ > metron-streaming/Metron-Topologies/src/test/java/org/apache/metron/integration/EnrichmentIntegrationTest.java. > This test spins up in memory instances of storm, kafka and a >mock HBase > table and runs real data through the topology, ensuring the >output is what > we would expect. > - Due to volume, the pcap data actually skips the enrichment topology > and goes directly to HBase. > (see > metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topologies/pcap) > - Metron-EnrichmentAdapters > - This is where the actual enrichment adapters live. Also, threat intel > adapters live here. This is in the process of a bit of churn. >The things > to note about the enrichments is that we have moved to a split/join style > architecture. More on that can be found at the documentation associated > with https://issues.apache.org/jira/browse/METRON-35 > - One more thing to note, enrichment adapters take their > configuration from zookeeper so that we can adjust them in a running > topology without taking the topology down. See ConfiguredBolt and > GenericEnrichmentBolt for reasonable examples of how that looks. > - Metron-Indexing > - This is largely going to get split into two projects for Elasticsearch > and Solr, but there is also a HDFS indexing bolt (sending >enriched messages > to HDFS for future analysis) that might be of interest. Again, the > EnrichmentIntegrationTest drives data through these pathways. > - Metron-DataLoads > - This is a project intended to load data into HBase for use in the > enrichment adn threat intel adapters. Right now, in the current RC, this > is just for threat intel. > - The loaders supported currently are: > - Loading CSV files or Stix files via mapreduce into HBase (see > ThreatIntelBulkLoader and the associated integration test > BulkLoadMapperIntegrationTest) > - Loading threat intel data via a Taxii feed (see TaxiiLoader and > the associated integration test TaxiiIntegrationTest) > - In a PR submitted today by me, this will be generalized to support > loading enrichment data into HBase along with an accompanying enrichment > adapter which pulls enrichments data from HBase. Also, there will be a > flat file loader, so you can point to a CSV file and load enrichment or > threat intel data into HBase. > - Metron-MessageParsers > - You have the right of it below > - Metron-Common > - Common utilities > >Anyway, I hope that helps. I'd recommend digging into the tests, >especially the EnrichmentIntegrationTest to see how things work. Also, >watch out for the structure to shift under your feet for a bit here. > >Hope this helps! > >Looking forward to more PRs. :) > >Casey > >On Fri, Apr 1, 2016 at 12:38 AM, John <[email protected]> wrote: > >> Hello Dev@Metron, >> >> I've been thinking about getting more involved with Metron. I've already >> submitted a couple very simple PRs that got approved and one is now merged >> into master. The ansible and vagrant scripts have made it super easy to >> spend up a 10-node setup in AWS or a local VM setup for testing. So now I'm >> diving into the Metron-Streaming modules to try and figure out what roles >> each of play. I haven't dug super deep yet, so based on little I've seen, >> plus the individual README's -- this is what I've gathered so far at a >> high-level... >> >> - *Metron-Pcap_Service* : Example service that grab packets and stores >> them to HBase. >> - *Metron-DataServices* : How the messages(/events) get into the >> pipeline. >> - *Metron-MessageParsers* : Takes raw messages (which can be binary >> formats) and converts them to a common format of source/destination >> ip/port/protocol w/ timestamp+message. Looks like a couple of the >> parsing >> patterns forked from Logstash. >> - *Metron-EnrichmentAdapters* : As the messages come in, extra metadata >> can be added, like geo, whois, etc. So I guess the parsed message + any >> enrichment adapters you have enabled would be "the model". >> - *Metron-DataLoads* : How to get the enrichment data into the system. >> - *Metron-Alerts* : Sends the message onto the message stream like >> normal, but will also send it to the alert stream. >> - *Metron-Indexing* : This is the main output of the streaming system, >> which is currently Elasticsearch/Kibana(v3)… but looks like you're in >> the >> middle of adding Solr support too. >> - *Metron-Topologies* : To configure all this stuff to meet your needs >> (ex. which telemetries you want to collect). >> - *Metron-Testing* : To test this whole thing without needing servers or >> data. >> - *Metron-Common* : Dev tools/packages shared across modules. >> >> Totally not looking for someone to blow a bunch of time on a super detailed >> response; just curious if I'm totally off based on any of these modules or >> if I missed something super big. >> >> >> Thanks! >> John >>
