Hi Ryan, Here are my thoughts. I agree with the first level of breakdown. Deployment, Streaming, UI. That makes sense. Although we may re-think Streaming because it will now contain a PCAP MR job, which is batch. I would probably just call it Metron-Platform or something like that.
Under Metron-Pletform I would have the following projects: Common - agree with you we need it for the reasons you described. This will help us with code reuse and standardization DataManagement - contains data loaders (enrichment, threat intel) + data cleanup and rotation scripts PCAP - PCAP Storm topology + PCAP Service + MR job to back the service Parsers - Parser topology + parser bolt + parser modules/grok expressions. I think this should be broken up like this to make the incremental cost of adding new topologies as low as possible. To add a new topology we only want a user to build and deploy this jar and we want this jar to be as light as possible to only contain code for adding additional sources. Enrichment - enrichment topology + threat intel + alerts Next level down under Enrichment I would include elastic search and sold indexing projects as modules. I don’t think they warrant their own project, but they can be sub-modules of enrichment. API - I am in agreement with you that we need this. However, I think this API should wrap the PCAP service + introduce additional services for security and multi tenancy (discuss threads are going around right now). We want our security model to be consistently enforced so we should build it into this module and expose it as REST services. What do you think? Thanks, James On 4/8/16, 1:46 PM, "Ryan Merriman" <[email protected]> wrote: >All, > >I would like to propose a review and refactor of the current project >organization within Metron. Much of the way the legacy code was organized >does not make sense anymore and could be designed so that it is easier to >navigate and understand. Our test coverage has increased substantially so I >believe we can do this with confidence. > >First off, I think we should agree on a naming convention. I see some >projects (YARN and Storm for example) that prepend the sub-project with the >name of the top-level project (storm-core for example). Metron also currently >does this (Metron-Common). I think that's fine, although in the case of >Metron, I feel like having "Metron" prepended is redundant. Regardless of >whether we decide to stick with that approach, I propose that project names be >uniform and lowercase. For example, under these assumptions "Metron-Common" >would change to "common". > >The first level of organization makes sense to me. Only change I would make >would be to project names: > > * deployment > * streaming > * ui > >Or if we want to keep metron in project names: > > * metron-deployment > * metron-streaming > * metron-ui > >For now I don't see any changes necessary in deployment or ui organization. I >see the streaming project structure primarily driven by 2 things: the Maven >dependency tree and deployment targets. For example, solr and elasticsearch >code should be separated (because their dependency on lucene conflicts) but >both will depend on common enrichment code. Also, now that parser, enrichment >and pcap topologies are separate, code for those topologies will be deployed >as separate jars. No reason to include parser code in enrichment topologies >and vice-versa. Any other considerations I'm missing? > >With that being said, here is my initial proposal: > > * common - Any common code that all topologies depend on (configuration > classes, generic writers for example). No dependencies on other Metron > projects. > * test - Contains utilities for writing unit tests, sample configs and > sample data. Will depend on common. > * integration-test - Contains utilities and classes needed to run our > integration tests (in memory components for example). Will depend on common > and test. > * dataload - Contains all code related to data loading. Will also include > any property files needed and integration tests. Will depend on common, test > (test scope), and integration-test (test scope). > * parser - All code specific to the parser topologies. Would also include > scripts, property files, flux files and parser topology integration tests. > This project will depend on common, test (test scope), and > integration-testing (test scope). > * enrichment - All code specific to the enrichment topologies (except solr > and elasticsearch). Would also include scripts, property files, flux files > and enrichment topology integration tests. This project will depend on > common, test (test scope), and integration-test (test scope). > * elasticsearch - All Elasticsearch related code. Will depend on > enrichment. > * solr - All Solr related code. Will depend on enrichment. > * pcap - All code specific to the topology dedicated to pcap. Would also > include scripts, property files, flux files and pcap integration test. This > project will depend on common, test (test scope) and integration-test (test > scope). > * api - This will serve as a generic replacement for Metron-Pcap_Service. > Will contain all code to build a Metron web service middle layer that can > expose APIs through REST or other client protocols. Could possibly depend on > all other projects or separated further if version conflicts arise (separate > api projects for solr and elasticsearch for example). > >Looking forward to hearing everyone's feedback and great ideas. > >Ryan Merriman
