Some of HBase bolt related classes were created in OpenSoc as that time Storm's HBase bolt did not have all necessary features (ability to add custom configs, enable/disable WAL, easy tuple mapping etc.). It should be re-evaluated to see if we can leverage the these components from Storm itself so as to avoid additional maintenance.
Some observations and pointers for more thoughts: * HbaseConverter should be H*B*aseConverter to match other cases. * org.apache.metron.enrichment.bolt.HBaseBolt.java is in bolt package but other hbase components are in hbase package. * It may be better to have project structure on functional grouping than mix of function + implementation choices for example solr, and es probably could be packages than sub modules. (Unless the intention is to support more such "pluggable" indexing mechanisms at any given point) * parsers/enrichments, are they expected to be reused across multiple projects? If yes, are they different from common? If not, should they be packages instead? * From deployment perspective essentially there following broader categories 1. Data Acquisition (pcap, nifi, flume, kafka writer etc.) 2. Active Analysis (real time pieces - kafka, storm topology, bolts, parsers, enrichments, alerts etc) 3. Deep Analytics (historic data analysis using ML, MR/Hive/tez/Spark related components) 4. Data Access (apis, UI etc) Would it make sense to create project structure in such functional groupings? On Mon, Apr 18, 2016 at 1:46 PM, James Sirota <[email protected]> wrote: > Hi Ryan, > > This is great. You should attach this to the Jira when you are ready to > commit the reorg so we know which parts shifted. > > Thanks, > James > > > > > On 4/18/16, 1:30 PM, "Ryan Merriman" <[email protected]> wrote: > > >Thanks Frank. I’ve updated those in the spreadsheet. > > > >On 4/18/16, 3:27 PM, "Frank Lu" <[email protected]> wrote: > > > >>As of now, I think the following classes are not used: > >> > >> > >> > >> > >>Metron-EnrichmentAdapters > >> org.apache.metron.enrichment.adapters.cif.AbstractCIFAdapter.java > >> > >> > >> org.apache.metron.enrichment.adapters.cif.CIFHbaseAdapter.java > >> > >>org.apache.metron.enrichment.adapters.whois.WhoisHBaseAdapter.java > >> > >> > >>Metron-DataLoads > >>org.apache.metron.dataloads.cif.HBaseTableLoad.java > >> > >> > >>Thanks, > >>Frank Lu > >> > >> > >> > >> > >>On 4/18/16, 3:05 PM, "Ryan Merriman" <[email protected]> wrote: > >> > >>>All, > >>> > >>>I put together a list of all the project java assets that details where > >>>they will be moved (or potentially deleted) as part of the project > >>>reorganization. Feedback welcome. > >>> > >>>Ryan Merriman > >>> > >>>On 4/13/16, 9:42 AM, "James Sirota" <[email protected]> wrote: > >>> > >>>>I would have configs as a project but rather as a folder structure that > >>>>other modules can point to > >>>> > >>>>Thanks, > >>>>James > >>>> > >>>> > >>>> > >>>> > >>>>On 4/13/16, 7:32 AM, "Ryan Merriman" <[email protected]> > wrote: > >>>> > >>>>>James brings up a good point. I propose adding another project under > >>>>>metron-platform called metron-configuration. This would be a fairly > >>>>>lightweight project that would contain anything related to > >>>>>configuration > >>>>>(property files, json files, flux files, etc). > >>>>> > >>>>>On 4/13/16, 8:56 AM, "James Sirota" <[email protected]> wrote: > >>>>> > >>>>>>+1 from me. > >>>>>> > >>>>>>I would also like to address the configs and make sure the configs > are > >>>>>>in > >>>>>>the same place. Do you have ideas on where we would put those? > >>>>>> > >>>>>>Thanks, > >>>>>>James > >>>>>> > >>>>>> > >>>>>> > >>>>>>On 4/13/16, 6:50 AM, "Ryan Merriman" <[email protected]> > >>>>>>wrote: > >>>>>> > >>>>>>>Thank you for all the feedback everyone. I will attempt to > summarize > >>>>>>>all > >>>>>>>the input we¹ve received and update my initial proposal. We can > >>>>>>>discuss > >>>>>>>further if anyone is still unclear and I will volunteer to capture > >>>>>>>all > >>>>>>>the > >>>>>>>details in a document of some kind once we all come to a consensus. > >>>>>>> > >>>>>>>Looks like everyone is in agreement for the top level projects. > Nick > >>>>>>>is > >>>>>>>working on a task that will require an addition top level project so > >>>>>>>I > >>>>>>>am > >>>>>>>going to add that in as well: > >>>>>>> > >>>>>>>metron-deployment > >>>>>>>metron-platform > >>>>>>>metron-ui > >>>>>>>metron-sensors > >>>>>>> > >>>>>>>All of these except metron-platform are well understood and don¹t > >>>>>>>warrant > >>>>>>>any more discussion. For metron-platform there seem to be 2 areas > >>>>>>>that > >>>>>>>are not as clear: > >>>>>>> > >>>>>>>- whether we need a common project > >>>>>>>- how do we organize test related code > >>>>>>> > >>>>>>>I agree with David and others that a common project will likely get > >>>>>>>misused and could become unnecessary bloated. But I suspect there > >>>>>>>will > >>>>>>>be > >>>>>>>cases where we have common code being used across multiple projects > >>>>>>>(is > >>>>>>>already happening). In this case we will either need this common > >>>>>>>project > >>>>>>>or we will have to keep common code in one of the other projects and > >>>>>>>have > >>>>>>>all other projects extend that. For the latter, an example would be > >>>>>>>keeping common code in enrichment and having parsers declare > >>>>>>>enrichment > >>>>>>>as > >>>>>>>a dependency. There are a couple downsides I see with this > approach: > >>>>>>> > >>>>>>>- parser topology jars now bring along all the enrichment > >>>>>>>dependencies > >>>>>>>- since more code from various projects are being packaged together, > >>>>>>>version conflicts are more likely and poms become more complicated > >>>>>>>due > >>>>>>>to > >>>>>>>all the necessary exclusions > >>>>>>> > >>>>>>>My thinking is that any jar file being deployed should only contain > >>>>>>>what > >>>>>>>it needs. Curious what others think here. My vote would be to > >>>>>>>maintain > >>>>>>>a > >>>>>>>common project (or whatever we want to call it) and be diligent > about > >>>>>>>not > >>>>>>>letting project-specific code slip in there. > >>>>>>> > >>>>>>>I believe Nick was the first person to ask the question about > >>>>>>>projects > >>>>>>>related to test code and why we would need separate test and > >>>>>>>integration > >>>>>>>test. The reason for this is that our integration-test classes > >>>>>>>currently > >>>>>>>depend on other projects (not surprising since they are integration > >>>>>>>tests). If there are utilities we want make available to all > >>>>>>>projects > >>>>>>>(mock classes, utilities for reading sample data, etc) then it can¹t > >>>>>>>live > >>>>>>>in integration-test because that will introduce circular > >>>>>>>dependencies. > >>>>>>>If > >>>>>>>it is possible to refactor our current Metron-Testing project so > that > >>>>>>>it > >>>>>>>doesn¹t depend on any other projects, then we can keep utilities > >>>>>>>here. > >>>>>>>Otherwise we need a separate project for testing utilities. I > >>>>>>>suspect > >>>>>>>removing other project dependencies from Metron-Testing will prove > >>>>>>>more > >>>>>>>difficult than it¹s worth so my vote would be to have 2 test related > >>>>>>>projects. > >>>>>>> > >>>>>>>So here is where our metron-platform organization stands: > >>>>>>> > >>>>>>>metron-common * > >>>>>>>metron-integration-test * > >>>>>>>metron-test-utilities * > >>>>>>>metron-data-management > >>>>>>>metron-pcap > >>>>>>>metron-parsers > >>>>>>>metron-enrichment > >>>>>>> metron-solr > >>>>>>> metron-elasticsearch > >>>>>>>metron-api > >>>>>>> > >>>>>>>* may or may not change depending on the outcome of this discussion > >>>>>>> > >>>>>>>Thoughts? > >>>>>>> > >>>>>>>Ryan Merriman > >>>>>>> > >>>>>>> > >>>>>>>On 4/11/16, 4:15 PM, "Debojyoti Dutta" <[email protected]> wrote: > >>>>>>> > >>>>>>>>If you load up your Irc client just type > >>>>>>>>/join #apache-metron-dev > >>>>>>>> > >>>>>>>>Sent from my iPhone > >>>>>>>> > >>>>>>>>> On Apr 11, 2016, at 12:06 PM, James Sirota > >>>>>>>>><[email protected]> > >>>>>>>>>wrote: > >>>>>>>>> > >>>>>>>>> Great, thanks, Debo. Where can I find instructions on how to get > >>>>>>>>>to > >>>>>>>>>it? > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> James > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" <[email protected]> > >>>>>>>>>>wrote: > >>>>>>>>>> > >>>>>>>>>> Hi James > >>>>>>>>>> > >>>>>>>>>> Ok set it up and ack Š.. > >>>>>>>>>> > >>>>>>>>>> Thx > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On 4/10/16, 6:31 PM, "James Sirota" <[email protected]> > >>>>>>>>>>>wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi Debo, > >>>>>>>>>>> > >>>>>>>>>>> I think it would be great if you set it up > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> James > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On 4/10/16, 6:25 PM, "Debojyoti Dutta" <[email protected]> > >>>>>>>>>>>>wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> I have set it up for another open source effort in the past > and > >>>>>>>>>>>>it > >>>>>>>>>>>>was not very hard. Am happy to volunteer if needed. > >>>>>>>>>>>> > >>>>>>>>>>>> Thx > >>>>>>>>>>>> Debo > >>>>>>>>>>>> > >>>>>>>>>>>> Sent from my iPhone > >>>>>>>>>>>> > >>>>>>>>>>>>> On Apr 10, 2016, at 5:53 PM, James Sirota > >>>>>>>>>>>>><[email protected]> > >>>>>>>>>>>>>wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> I¹d be open to an IRC channel. Does anyone know if Apache > >>>>>>>>>>>>>allows > >>>>>>>>>>>>>this? If yes, does anyone know how to set one up? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> James > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On 4/10/16, 4:52 PM, "Debojyoti Dutta" <[email protected]> > >>>>>>>>>>>>>>wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Nick > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I like your suggestions. For the enrichment layer do you > >>>>>>>>>>>>>>think > >>>>>>>>>>>>>>it > >>>>>>>>>>>>>>would also include any advanced analytics. Else we might want > >>>>>>>>>>>>>>to > >>>>>>>>>>>>>>have an analytics layer. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> It would be good to have an arch which could be extended for > >>>>>>>>>>>>>>new > >>>>>>>>>>>>>>functionality. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> However Ryan's suggestion of the ui API and deployer also > >>>>>>>>>>>>>>makes > >>>>>>>>>>>>>>sense. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Should we have an IRC channel to discuss this or maybe > >>>>>>>>>>>>>>etherpad? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Debo > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Sent from my iPhone > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Apr 10, 2016, at 4:36 PM, Nick Allen < > [email protected]> > >>>>>>>>>>>>>>>wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> It might help to think of our code base as four separate > >>>>>>>>>>>>>>>types > >>>>>>>>>>>>>>>of > >>>>>>>>>>>>>>> functionality. This is primarily meant to give us a > >>>>>>>>>>>>>>>framework > >>>>>>>>>>>>>>>to > >>>>>>>>>>>>>>>think > >>>>>>>>>>>>>>> about the organization of Metron (and drive more > >>>>>>>>>>>>>>>discussion), > >>>>>>>>>>>>>>>rather than > >>>>>>>>>>>>>>> my proposal for a specific structure. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - Sensor - Anything that captures external, non-streaming > >>>>>>>>>>>>>>>data > >>>>>>>>>>>>>>>and > >>>>>>>>>>>>>>> presents it in a form ready for stream processing. > >>>>>>>>>>>>>>> - Input - Responsible for preparing streaming data for > >>>>>>>>>>>>>>>enrichment. The > >>>>>>>>>>>>>>> existing "parsers" fit neatly into this space. > >>>>>>>>>>>>>>> - Enrichment - Responsible for enriching an incoming data > >>>>>>>>>>>>>>>feed > >>>>>>>>>>>>>>>like > >>>>>>>>>>>>>>> geoip, asset enrichment, threat intel lookups, etc. > >>>>>>>>>>>>>>> - Output - Responsible for persisting data that has been > >>>>>>>>>>>>>>>processed by > >>>>>>>>>>>>>>> Metron which obviously means search indexers or data > stores. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman > >>>>>>>>>>>>>>><[email protected]> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> All, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I would like to propose a review and refactor of the > >>>>>>>>>>>>>>>>current > >>>>>>>>>>>>>>>>project > >>>>>>>>>>>>>>>> organization within Metron. Much of the way the legacy > >>>>>>>>>>>>>>>>code > >>>>>>>>>>>>>>>>was > >>>>>>>>>>>>>>>>organized > >>>>>>>>>>>>>>>> does not make sense anymore and could be designed so that > >>>>>>>>>>>>>>>>it > >>>>>>>>>>>>>>>>is > >>>>>>>>>>>>>>>>easier to > >>>>>>>>>>>>>>>> navigate and understand. Our test coverage has increased > >>>>>>>>>>>>>>>>substantially so > >>>>>>>>>>>>>>>> I believe we can do this with confidence. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> First off, I think we should agree on a naming convention. > >>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>see some > >>>>>>>>>>>>>>>> projects (YARN and Storm for example) that prepend the > >>>>>>>>>>>>>>>>sub-project with the > >>>>>>>>>>>>>>>> name of the top-level project (storm-core for example). > >>>>>>>>>>>>>>>>Metron > >>>>>>>>>>>>>>>>also > >>>>>>>>>>>>>>>> currently does this (Metron-Common). I think that's fine, > >>>>>>>>>>>>>>>>although in the > >>>>>>>>>>>>>>>> case of Metron, I feel like having "Metron" prepended is > >>>>>>>>>>>>>>>>redundant. > >>>>>>>>>>>>>>>> Regardless of whether we decide to stick with that > >>>>>>>>>>>>>>>>approach, > >>>>>>>>>>>>>>>>I > >>>>>>>>>>>>>>>>propose that > >>>>>>>>>>>>>>>> project names be uniform and lowercase. For example, > under > >>>>>>>>>>>>>>>>these > >>>>>>>>>>>>>>>> assumptions "Metron-Common" would change to "common". > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The first level of organization makes sense to me. Only > >>>>>>>>>>>>>>>>change > >>>>>>>>>>>>>>>>I would > >>>>>>>>>>>>>>>> make would be to project names: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> * deployment > >>>>>>>>>>>>>>>> * streaming > >>>>>>>>>>>>>>>> * ui > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Or if we want to keep metron in project names: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> * metron-deployment > >>>>>>>>>>>>>>>> * metron-streaming > >>>>>>>>>>>>>>>> * metron-ui > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> For now I don't see any changes necessary in deployment or > >>>>>>>>>>>>>>>>ui > >>>>>>>>>>>>>>>> organization. I see the streaming project structure > >>>>>>>>>>>>>>>>primarily > >>>>>>>>>>>>>>>>driven by 2 > >>>>>>>>>>>>>>>> things: the Maven dependency tree and deployment targets. > >>>>>>>>>>>>>>>>For > >>>>>>>>>>>>>>>>example, > >>>>>>>>>>>>>>>> solr and elasticsearch code should be separated (because > >>>>>>>>>>>>>>>>their > >>>>>>>>>>>>>>>>dependency > >>>>>>>>>>>>>>>> on lucene conflicts) but both will depend on common > >>>>>>>>>>>>>>>>enrichment > >>>>>>>>>>>>>>>>code. Also, > >>>>>>>>>>>>>>>> now that parser, enrichment and pcap topologies are > >>>>>>>>>>>>>>>>separate, > >>>>>>>>>>>>>>>>code for > >>>>>>>>>>>>>>>> those topologies will be deployed as separate jars. No > >>>>>>>>>>>>>>>>reason > >>>>>>>>>>>>>>>>to include > >>>>>>>>>>>>>>>> parser code in enrichment topologies and vice-versa. Any > >>>>>>>>>>>>>>>>other > >>>>>>>>>>>>>>>> considerations I'm missing? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> With that being said, here is my initial proposal: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> * common - Any common code that all topologies depend > on > >>>>>>>>>>>>>>>> (configuration classes, generic writers for example). No > >>>>>>>>>>>>>>>>dependencies on > >>>>>>>>>>>>>>>> other Metron projects. > >>>>>>>>>>>>>>>> * test - Contains utilities for writing unit tests, > >>>>>>>>>>>>>>>>sample > >>>>>>>>>>>>>>>>configs and > >>>>>>>>>>>>>>>> sample data. Will depend on common. > >>>>>>>>>>>>>>>> * integration-test - Contains utilities and classes > >>>>>>>>>>>>>>>>needed > >>>>>>>>>>>>>>>>to > >>>>>>>>>>>>>>>>run our > >>>>>>>>>>>>>>>> integration tests (in memory components for example). > Will > >>>>>>>>>>>>>>>>depend on > >>>>>>>>>>>>>>>> common and test. > >>>>>>>>>>>>>>>> * dataload - Contains all code related to data loading. > >>>>>>>>>>>>>>>>Will > >>>>>>>>>>>>>>>>also > >>>>>>>>>>>>>>>> include any property files needed and integration tests. > >>>>>>>>>>>>>>>>Will > >>>>>>>>>>>>>>>>depend on > >>>>>>>>>>>>>>>> common, test (test scope), and integration-test (test > >>>>>>>>>>>>>>>>scope). > >>>>>>>>>>>>>>>> * parser - All code specific to the parser topologies. > >>>>>>>>>>>>>>>>Would > >>>>>>>>>>>>>>>>also > >>>>>>>>>>>>>>>> include scripts, property files, flux files and parser > >>>>>>>>>>>>>>>>topology > >>>>>>>>>>>>>>>>integration > >>>>>>>>>>>>>>>> tests. This project will depend on common, test (test > >>>>>>>>>>>>>>>>scope), > >>>>>>>>>>>>>>>>and > >>>>>>>>>>>>>>>> integration-testing (test scope). > >>>>>>>>>>>>>>>> * enrichment - All code specific to the enrichment > >>>>>>>>>>>>>>>>topologies > >>>>>>>>>>>>>>>>(except > >>>>>>>>>>>>>>>> solr and elasticsearch). Would also include scripts, > >>>>>>>>>>>>>>>>property > >>>>>>>>>>>>>>>>files, flux > >>>>>>>>>>>>>>>> files and enrichment topology integration tests. This > >>>>>>>>>>>>>>>>project > >>>>>>>>>>>>>>>>will depend > >>>>>>>>>>>>>>>> on common, test (test scope), and integration-test (test > >>>>>>>>>>>>>>>>scope). > >>>>>>>>>>>>>>>> * elasticsearch - All Elasticsearch related code. Will > >>>>>>>>>>>>>>>>depend > >>>>>>>>>>>>>>>>on > >>>>>>>>>>>>>>>> enrichment. > >>>>>>>>>>>>>>>> * solr - All Solr related code. Will depend on > >>>>>>>>>>>>>>>>enrichment. > >>>>>>>>>>>>>>>> * pcap - All code specific to the topology dedicated to > >>>>>>>>>>>>>>>>pcap. > >>>>>>>>>>>>>>>>Would > >>>>>>>>>>>>>>>> also include scripts, property files, flux files and pcap > >>>>>>>>>>>>>>>>integration > >>>>>>>>>>>>>>>> test. This project will depend on common, test (test > >>>>>>>>>>>>>>>>scope) > >>>>>>>>>>>>>>>>and > >>>>>>>>>>>>>>>> integration-test (test scope). > >>>>>>>>>>>>>>>> * api - This will serve as a generic replacement for > >>>>>>>>>>>>>>>> Metron-Pcap_Service. Will contain all code to build a > >>>>>>>>>>>>>>>>Metron > >>>>>>>>>>>>>>>>web service > >>>>>>>>>>>>>>>> middle layer that can expose APIs through REST or other > >>>>>>>>>>>>>>>>client > >>>>>>>>>>>>>>>>protocols. > >>>>>>>>>>>>>>>> Could possibly depend on all other projects or separated > >>>>>>>>>>>>>>>>further > >>>>>>>>>>>>>>>>if version > >>>>>>>>>>>>>>>> conflicts arise (separate api projects for solr and > >>>>>>>>>>>>>>>>elasticsearch for > >>>>>>>>>>>>>>>> example). > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Looking forward to hearing everyone's feedback and great > >>>>>>>>>>>>>>>>ideas. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Ryan Merriman > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> Nick Allen <[email protected]> > >>>>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>> > > >
