Re: [DISCUSS] Project reorganization

James Sirota Wed, 13 Apr 2016 07:43:49 -0700

I would have configs as a project but rather as a folder structure that other 
modules can point to


Thanks,
James 




On 4/13/16, 7:32 AM, "Ryan Merriman" <[email protected]> wrote:

>James brings up a good point.  I propose adding another project under
>metron-platform called metron-configuration.  This would be a fairly
>lightweight project that would contain anything related to configuration
>(property files, json files, flux files, etc).
>
>On 4/13/16, 8:56 AM, "James Sirota" <[email protected]> wrote:
>
>>+1 from me.
>>
>>I would also like to address the configs and make sure the configs are in
>>the same place.  Do you have ideas on where we would put those?
>>
>>Thanks,
>>James 
>>
>>
>>
>>On 4/13/16, 6:50 AM, "Ryan Merriman" <[email protected]> wrote:
>>
>>>Thank you for all the feedback everyone.  I will attempt to summarize all
>>>the input we¹ve received and update my initial proposal.  We can discuss
>>>further if anyone is still unclear and I will volunteer to capture all
>>>the
>>>details in a document of some kind once we all come to a consensus.
>>>
>>>Looks like everyone is in agreement for the top level projects.  Nick is
>>>working on a task that will require an addition top level project so I am
>>>going to add that in as well:
>>>
>>>metron-deployment
>>>metron-platform
>>>metron-ui
>>>metron-sensors
>>>
>>>All of these except metron-platform are well understood and don¹t warrant
>>>any more discussion.  For metron-platform there seem to be 2 areas that
>>>are not as clear:
>>>
>>>- whether we need a common project
>>>- how do we organize test related code
>>>
>>>I agree with David and others that a common project will likely get
>>>misused and could become unnecessary bloated.  But I suspect there will
>>>be
>>>cases where we have common code being used across multiple projects (is
>>>already happening).  In this case we will either need this common project
>>>or we will have to keep common code in one of the other projects and have
>>>all other projects extend that. For the latter, an example would be
>>>keeping common code in enrichment and having parsers declare enrichment
>>>as
>>>a dependency.  There are a couple downsides I see with this approach:
>>>
>>>- parser topology jars now bring along all the enrichment dependencies
>>>- since more code from various projects are being packaged together,
>>>version conflicts are more likely and poms become more complicated due to
>>>all the necessary exclusions
>>>
>>>My thinking is that any jar file being deployed should only contain what
>>>it needs.  Curious what others think here.  My vote would be to maintain
>>>a
>>>common project (or whatever we want to call it) and be diligent about not
>>>letting project-specific code slip in there.
>>>
>>>I believe Nick was the first person to ask the question about projects
>>>related to test code and why we would need separate test and integration
>>>test.  The reason for this is that our integration-test classes currently
>>>depend on other projects (not surprising since they are integration
>>>tests).  If there are utilities we want make available to all projects
>>>(mock classes, utilities for reading sample data, etc) then it can¹t live
>>>in integration-test because that will introduce circular dependencies.
>>>If
>>>it is possible to refactor our current Metron-Testing project so that it
>>>doesn¹t depend on any other projects, then we can keep utilities here.
>>>Otherwise we need a separate project for testing utilities.  I suspect
>>>removing other project dependencies from Metron-Testing will prove more
>>>difficult than it¹s worth so my vote would be to have 2 test related
>>>projects.
>>>
>>>So here is where our metron-platform organization stands:
>>>
>>>metron-common *
>>>metron-integration-test *
>>>metron-test-utilities *
>>>metron-data-management
>>>metron-pcap
>>>metron-parsers
>>>metron-enrichment
>>>     metron-solr
>>>     metron-elasticsearch
>>>metron-api
>>>
>>>* may or may not change depending on the outcome of this discussion
>>>
>>>Thoughts?
>>>
>>>Ryan Merriman
>>>
>>>
>>>On 4/11/16, 4:15 PM, "Debojyoti Dutta" <[email protected]> wrote:
>>>
>>>>If you load up your Irc client just type
>>>>/join #apache-metron-dev
>>>>
>>>>Sent from my iPhone
>>>>
>>>>> On Apr 11, 2016, at 12:06 PM, James Sirota <[email protected]>
>>>>>wrote:
>>>>> 
>>>>> Great, thanks, Debo.  Where can I find instructions on how to get to
>>>>>it?
>>>>> 
>>>>> Thanks,
>>>>> James 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 4/11/16, 9:41 AM, "Debo Dutta (dedutta)" <[email protected]>
>>>>>>wrote:
>>>>>> 
>>>>>> Hi James 
>>>>>> 
>>>>>> Ok set it up and ack Š..
>>>>>> 
>>>>>> Thx
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 4/10/16, 6:31 PM, "James Sirota" <[email protected]> wrote:
>>>>>>> 
>>>>>>> Hi Debo,
>>>>>>> 
>>>>>>> I think it would be great if you set it up
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> James 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 4/10/16, 6:25 PM, "Debojyoti Dutta" <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> I have set it up for another open source effort in the past and it
>>>>>>>>was not very hard. Am happy to volunteer if needed.
>>>>>>>> 
>>>>>>>> Thx 
>>>>>>>> Debo
>>>>>>>> 
>>>>>>>> Sent from my iPhone
>>>>>>>> 
>>>>>>>>> On Apr 10, 2016, at 5:53 PM, James Sirota
>>>>>>>>><[email protected]>
>>>>>>>>>wrote:
>>>>>>>>> 
>>>>>>>>> I¹d be open to an IRC channel.  Does anyone know if Apache allows
>>>>>>>>>this?  If yes, does anyone know how to set one up?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> James 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 4/10/16, 4:52 PM, "Debojyoti Dutta" <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Nick 
>>>>>>>>>> 
>>>>>>>>>> I like your suggestions. For the enrichment layer do you think it
>>>>>>>>>>would also include any advanced analytics. Else we might want to
>>>>>>>>>>have an analytics layer.
>>>>>>>>>> 
>>>>>>>>>> It would be good to have an arch which could be extended for new
>>>>>>>>>>functionality.
>>>>>>>>>> 
>>>>>>>>>> However Ryan's suggestion of the ui API and deployer also makes
>>>>>>>>>>sense. 
>>>>>>>>>> 
>>>>>>>>>> Should we have an IRC channel to discuss this or maybe etherpad?
>>>>>>>>>> 
>>>>>>>>>> Debo
>>>>>>>>>> 
>>>>>>>>>> Sent from my iPhone
>>>>>>>>>> 
>>>>>>>>>>> On Apr 10, 2016, at 4:36 PM, Nick Allen <[email protected]>
>>>>>>>>>>>wrote:
>>>>>>>>>>> 
>>>>>>>>>>> It might help to think of our code base as four separate types
>>>>>>>>>>>of
>>>>>>>>>>> functionality.  This is primarily meant to give us a framework
>>>>>>>>>>>to
>>>>>>>>>>>think
>>>>>>>>>>> about the organization of Metron (and drive more discussion),
>>>>>>>>>>>rather than
>>>>>>>>>>> my proposal for a specific structure.
>>>>>>>>>>> 
>>>>>>>>>>> - Sensor - Anything that captures external, non-streaming data
>>>>>>>>>>>and
>>>>>>>>>>> presents it in a form ready for stream processing.
>>>>>>>>>>> - Input - Responsible for preparing streaming data for
>>>>>>>>>>>enrichment.  The
>>>>>>>>>>> existing "parsers" fit neatly into this space.
>>>>>>>>>>> - Enrichment - Responsible for enriching an incoming data feed
>>>>>>>>>>>like
>>>>>>>>>>> geoip, asset enrichment, threat intel lookups, etc.
>>>>>>>>>>> - Output - Responsible for persisting data that has been
>>>>>>>>>>>processed by
>>>>>>>>>>> Metron which obviously means search indexers or data stores.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Apr 8, 2016 at 4:46 PM, Ryan Merriman
>>>>>>>>>>><[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> All,
>>>>>>>>>>>> 
>>>>>>>>>>>> I would like to propose a review and refactor of the current
>>>>>>>>>>>>project
>>>>>>>>>>>> organization within Metron.  Much of the way the legacy code
>>>>>>>>>>>>was
>>>>>>>>>>>>organized
>>>>>>>>>>>> does not make sense anymore and could be designed so that it is
>>>>>>>>>>>>easier to
>>>>>>>>>>>> navigate and understand.  Our test coverage has increased
>>>>>>>>>>>>substantially so
>>>>>>>>>>>> I believe we can do this with confidence.
>>>>>>>>>>>> 
>>>>>>>>>>>> First off, I think we should agree on a naming convention.  I
>>>>>>>>>>>>see some
>>>>>>>>>>>> projects (YARN and Storm for example) that prepend the
>>>>>>>>>>>>sub-project with the
>>>>>>>>>>>> name of the top-level project (storm-core for example).  Metron
>>>>>>>>>>>>also
>>>>>>>>>>>> currently does this (Metron-Common).  I think that's fine,
>>>>>>>>>>>>although in the
>>>>>>>>>>>> case of Metron, I feel like having "Metron" prepended is
>>>>>>>>>>>>redundant.
>>>>>>>>>>>> Regardless of whether we decide to stick with that approach, I
>>>>>>>>>>>>propose that
>>>>>>>>>>>> project names be uniform and lowercase.  For example, under
>>>>>>>>>>>>these
>>>>>>>>>>>> assumptions "Metron-Common" would change to "common".
>>>>>>>>>>>> 
>>>>>>>>>>>> The first level of organization makes sense to me.  Only change
>>>>>>>>>>>>I would
>>>>>>>>>>>> make would be to project names:
>>>>>>>>>>>> 
>>>>>>>>>>>> *   deployment
>>>>>>>>>>>> *   streaming
>>>>>>>>>>>> *   ui
>>>>>>>>>>>> 
>>>>>>>>>>>> Or if we want to keep metron in project names:
>>>>>>>>>>>> 
>>>>>>>>>>>> *   metron-deployment
>>>>>>>>>>>> *   metron-streaming
>>>>>>>>>>>> *   metron-ui
>>>>>>>>>>>> 
>>>>>>>>>>>> For now I don't see any changes necessary in deployment or ui
>>>>>>>>>>>> organization.  I see the streaming project structure primarily
>>>>>>>>>>>>driven by 2
>>>>>>>>>>>> things:  the Maven dependency tree and deployment targets.  For
>>>>>>>>>>>>example,
>>>>>>>>>>>> solr and elasticsearch code should be separated (because their
>>>>>>>>>>>>dependency
>>>>>>>>>>>> on lucene conflicts) but both will depend on common enrichment
>>>>>>>>>>>>code.  Also,
>>>>>>>>>>>> now that parser, enrichment and pcap topologies are separate,
>>>>>>>>>>>>code for
>>>>>>>>>>>> those topologies will be deployed as separate jars.  No reason
>>>>>>>>>>>>to include
>>>>>>>>>>>> parser code in enrichment topologies and vice-versa.  Any other
>>>>>>>>>>>> considerations I'm missing?
>>>>>>>>>>>> 
>>>>>>>>>>>> With that being said, here is my initial proposal:
>>>>>>>>>>>> 
>>>>>>>>>>>> *   common -  Any common code that all topologies depend on
>>>>>>>>>>>> (configuration classes, generic writers for example).  No
>>>>>>>>>>>>dependencies on
>>>>>>>>>>>> other Metron projects.
>>>>>>>>>>>> *   test - Contains utilities for writing unit tests, sample
>>>>>>>>>>>>configs and
>>>>>>>>>>>> sample data.  Will depend on common.
>>>>>>>>>>>> *   integration-test - Contains utilities and classes needed to
>>>>>>>>>>>>run our
>>>>>>>>>>>> integration tests (in memory components for example).  Will
>>>>>>>>>>>>depend on
>>>>>>>>>>>> common and test.
>>>>>>>>>>>> *   dataload - Contains all code related to data loading.  Will
>>>>>>>>>>>>also
>>>>>>>>>>>> include any property files needed and integration tests.  Will
>>>>>>>>>>>>depend on
>>>>>>>>>>>> common, test (test scope), and integration-test (test scope).
>>>>>>>>>>>> *   parser - All code specific to the parser topologies.  Would
>>>>>>>>>>>>also
>>>>>>>>>>>> include scripts, property files, flux files and parser topology
>>>>>>>>>>>>integration
>>>>>>>>>>>> tests.  This project will depend on common, test (test scope),
>>>>>>>>>>>>and
>>>>>>>>>>>> integration-testing (test scope).
>>>>>>>>>>>> *   enrichment - All code specific to the enrichment topologies
>>>>>>>>>>>>(except
>>>>>>>>>>>> solr and elasticsearch).  Would also include scripts, property
>>>>>>>>>>>>files, flux
>>>>>>>>>>>> files and enrichment topology integration tests.  This project
>>>>>>>>>>>>will depend
>>>>>>>>>>>> on common, test (test scope), and integration-test (test
>>>>>>>>>>>>scope).
>>>>>>>>>>>> *   elasticsearch - All Elasticsearch related code.  Will
>>>>>>>>>>>>depend
>>>>>>>>>>>>on
>>>>>>>>>>>> enrichment.
>>>>>>>>>>>> *   solr - All Solr related code.  Will depend on enrichment.
>>>>>>>>>>>> *   pcap - All code specific to the topology dedicated to pcap.
>>>>>>>>>>>>Would
>>>>>>>>>>>> also include scripts, property files, flux files and pcap
>>>>>>>>>>>>integration
>>>>>>>>>>>> test.  This project will depend on common, test (test scope)
>>>>>>>>>>>>and
>>>>>>>>>>>> integration-test (test scope).
>>>>>>>>>>>> *   api - This will serve as a generic replacement for
>>>>>>>>>>>> Metron-Pcap_Service.  Will contain all code to build a Metron
>>>>>>>>>>>>web service
>>>>>>>>>>>> middle layer that can expose APIs through REST or other client
>>>>>>>>>>>>protocols.
>>>>>>>>>>>> Could possibly depend on all other projects or separated
>>>>>>>>>>>>further
>>>>>>>>>>>>if version
>>>>>>>>>>>> conflicts arise (separate api projects for solr and
>>>>>>>>>>>>elasticsearch for
>>>>>>>>>>>> example).
>>>>>>>>>>>> 
>>>>>>>>>>>> Looking forward to hearing everyone's feedback and great ideas.
>>>>>>>>>>>> 
>>>>>>>>>>>> Ryan Merriman
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -- 
>>>>>>>>>>> Nick Allen <[email protected]>
>>>>>>>> 
>>>>
>>>
>>>
>

Re: [DISCUSS] Project reorganization

Reply via email to