Hi Casey Thanks.
What I meant was this - are we discussing the changes in arch and the re-factoring plans somewhere in the open? Is there any UI design work happening in the open? debo On 4/1/16, 7:54 AM, "Casey Stella" <[email protected]> wrote: >Hi Debo, > >Thanks! I'm glad that it's useful. The issue is that most things are in >flux at the moment. That being said, we have some pretty complete >documentation at >https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture >This email was an attempt to connect the code to the architecture to help >out new contributors. I definitely think something should be put up >around >this theme on the wiki or the website. > >As things move forward, we'll definitely work to keep the documentation >current. > >Casey > >On Fri, Apr 1, 2016 at 10:39 AM, Debo Dutta (dedutta) <[email protected]> >wrote: > >> Hi Casey >> >> This is a good intro. We should have this on our web pages. On the topic >> of metron streaming re-arch and re-factor, is there a document that is >> being worked on? >> >> The dev list is quiet :) >> >> debo >> >> >> >> >> On 4/1/16, 6:08 AM, "Casey Stella" <[email protected]> wrote: >> >> >Hey John, >> > >> >First of all, thanks for the contributions. Contributions make open >> source >> >work, so thanks so much for that. >> > >> >The structure of metron-streaming will likely be shifting. The lay of >>the >> >land is that the last few months have seen a rearchitecture of a lot of >> the >> >old opensoc code. As it stands, there's some code that is no longer >>used >> >and the organization could use some work. As such, expect this >>structure >> >to shift a bit. This is one of the reasons that there's been less >>formal >> >documentation than there will be going forward (I promise :). >> > >> >However, let's consider the structure as it stands now (I am going to >>skip >> >the projects that I do not believe are being actively used). This is >>just >> >intended to give some color to the good work already done at >> >https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture: >> > >> > - Metron-Pcap_Service >> > - This is the REST service which serves up packet capture data >>from >> > HBase (at present). The requests come in through the pcap panel >> >in kibana. >> > >> > - Check out org.apache.metron.pcapservice.PcapGetterHBaseImpl to >>see >> > how this works. >> > - I'd recommend looking at the unit test >> > org.apache.metron.pcapservice.PcapGetterHBaseImplTest >> > - Metron-Topologies >> > - This project mostly, at this point, holds the Storm topologies in >>the >> > form of Flux yaml files. There are generally two types of >> topologies, >> > parser topologies and the enrichment topology. >> > - These aim of the sensor specific topologies is to take the raw >> > sensor output and normalize it to some extent. The input is the >> >raw sensor >> > data via kafka and the output is a semi-normalized JSON (there >>are >> still >> > sensor specific stuff in there, but we ensure that src, dest >> ip/port and >> > protocol are all there in predictable fieldnames) to Kafka. >> > - Yaf: >> > >> >>metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topo >>logies/yaf >> > - >> > Bro: >> >>metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topo >>logies/bro >> > - >> > Snort: >> >> >>>metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/top >>>ologies/snort >> > - The enrichment topology is intended to pull the >>quasi-normalized >> > JSON out and add enrichments. Enrichments come in two varieties >> now, >> > threat intelligence and enrichments such as geo tagging >> > - The topology is >> > at >> >>metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topo >>logies/enrichment >> > - By *far* the best way to understand what is going on >> > enrichment-wise is to look at the integration test >> > @ >> >>metron-streaming/Metron-Topologies/src/test/java/org/apache/metron/integr >>ation/EnrichmentIntegrationTest.java. >> > This test spins up in memory instances of storm, kafka and a >> >mock HBase >> > table and runs real data through the topology, ensuring the >> >output is what >> > we would expect. >> > - Due to volume, the pcap data actually skips the enrichment >> topology >> > and goes directly to HBase. >> > (see >> >>metron-streaming/Metron-Topologies/src/main/resources/Metron_Configs/topo >>logies/pcap) >> > - Metron-EnrichmentAdapters >> > - This is where the actual enrichment adapters live. Also, threat >> intel >> > adapters live here. This is in the process of a bit of churn. >> >The things >> > to note about the enrichments is that we have moved to a >>split/join >> style >> > architecture. More on that can be found at the documentation >> associated >> > with https://issues.apache.org/jira/browse/METRON-35 >> > - One more thing to note, enrichment adapters take their >> > configuration from zookeeper so that we can adjust them in a >>running >> > topology without taking the topology down. See ConfiguredBolt >>and >> > GenericEnrichmentBolt for reasonable examples of how that looks. >> > - Metron-Indexing >> > - This is largely going to get split into two projects for >> Elasticsearch >> > and Solr, but there is also a HDFS indexing bolt (sending >> >enriched messages >> > to HDFS for future analysis) that might be of interest. Again, >>the >> > EnrichmentIntegrationTest drives data through these pathways. >> > - Metron-DataLoads >> > - This is a project intended to load data into HBase for use in >>the >> > enrichment adn threat intel adapters. Right now, in the current >> RC, this >> > is just for threat intel. >> > - The loaders supported currently are: >> > - Loading CSV files or Stix files via mapreduce into HBase >>(see >> > ThreatIntelBulkLoader and the associated integration test >> > BulkLoadMapperIntegrationTest) >> > - Loading threat intel data via a Taxii feed (see TaxiiLoader >>and >> > the associated integration test TaxiiIntegrationTest) >> > - In a PR submitted today by me, this will be generalized to >>support >> > loading enrichment data into HBase along with an accompanying >> enrichment >> > adapter which pulls enrichments data from HBase. Also, there >>will >> be a >> > flat file loader, so you can point to a CSV file and load >> enrichment or >> > threat intel data into HBase. >> > - Metron-MessageParsers >> > - You have the right of it below >> > - Metron-Common >> > - Common utilities >> > >> >Anyway, I hope that helps. I'd recommend digging into the tests, >> >especially the EnrichmentIntegrationTest to see how things work. Also, >> >watch out for the structure to shift under your feet for a bit here. >> > >> >Hope this helps! >> > >> >Looking forward to more PRs. :) >> > >> >Casey >> > >> >On Fri, Apr 1, 2016 at 12:38 AM, John <[email protected]> wrote: >> > >> >> Hello Dev@Metron, >> >> >> >> I've been thinking about getting more involved with Metron. I've >>already >> >> submitted a couple very simple PRs that got approved and one is now >> merged >> >> into master. The ansible and vagrant scripts have made it super easy >>to >> >> spend up a 10-node setup in AWS or a local VM setup for testing. So >>now >> I'm >> >> diving into the Metron-Streaming modules to try and figure out what >> roles >> >> each of play. I haven't dug super deep yet, so based on little I've >> seen, >> >> plus the individual README's -- this is what I've gathered so far at >>a >> >> high-level... >> >> >> >> - *Metron-Pcap_Service* : Example service that grab packets and >> stores >> >> them to HBase. >> >> - *Metron-DataServices* : How the messages(/events) get into the >> >> pipeline. >> >> - *Metron-MessageParsers* : Takes raw messages (which can be >>binary >> >> formats) and converts them to a common format of >>source/destination >> >> ip/port/protocol w/ timestamp+message. Looks like a couple of the >> >> parsing >> >> patterns forked from Logstash. >> >> - *Metron-EnrichmentAdapters* : As the messages come in, extra >> metadata >> >> can be added, like geo, whois, etc. So I guess the parsed message >>+ >> any >> >> enrichment adapters you have enabled would be "the model". >> >> - *Metron-DataLoads* : How to get the enrichment data into the >> system. >> >> - *Metron-Alerts* : Sends the message onto the message stream like >> >> normal, but will also send it to the alert stream. >> >> - *Metron-Indexing* : This is the main output of the streaming >> system, >> >> which is currently Elasticsearch/Kibana(v3)Š but looks like >>you're in >> >> the >> >> middle of adding Solr support too. >> >> - *Metron-Topologies* : To configure all this stuff to meet your >> needs >> >> (ex. which telemetries you want to collect). >> >> - *Metron-Testing* : To test this whole thing without needing >> servers or >> >> data. >> >> - *Metron-Common* : Dev tools/packages shared across modules. >> >> >> >> Totally not looking for someone to blow a bunch of time on a super >> detailed >> >> response; just curious if I'm totally off based on any of these >>modules >> or >> >> if I missed something super big. >> >> >> >> >> >> Thanks! >> >> John >> >> >>
