On 15/09/16 09:21, Darin Johnson wrote: > So my goal for the submodule refactor is pretty straight forward, I > basically want to separate the project into: pirk-core, pirk-hadoop, > pirk-spark, and pirk-storm. I think separating pirk-core and pirk-hadoop > is very ambitious at this point as there's a lot of dependencies we'd need > to resolve.
I think it is quite do-able, but agree that it is more work than the others. > pirk-storm and pirk-spark would be much more reasonable > starts. I'd also recommend we do something about the elastic-search > dependency, it seems more of an InputFormat option than part of pirk-core. > > There's a few blockers to this: > > This first is PIRK-63, here the ResponderDriver was calling the Responder > class of each specific framework. That fix is straight-forward, pass the > class as an argument I've started that here: > https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was expected > earlier - but had a rebase issue - so didn't get around to completing a few > bits). It also allows at least at the rudimentary level to add new > responders by putting jars on the classpath vs recompiling pirk. I'm open > to suggestions here - I think it's very likely ResponderLauncher isn't > needed and instead run could be a static member of another class, however > based off what was in ResponderDriver this seems to be the approach with > the fewest issues - especially storm. Give a shout when you want somebody to take a look. > Another is how we're passing the command line options in ResponderCLI, here > we're defining framework specific elements to the Driver which are then > passed to the underlying framework Driver/Topology/ToolRunner. This > becomes more difficult to address cleanly so seems like a good place to > start a discussion. I think this mechanism should be addressed though as > putting options for every framework/inputformat everyone could want in > untenable. I guess one option is structure the monolithic CLI around plug-ins, so rather than today's ResponderDriver <options for everything> ... it would become ResponderDriver --pir embedSelector=true --storm option=value ... and so on; or more likely ResponderDriver --pir optionsFile=pir.properties --storm optionsFile=storm.properties ... and then the driver can delegate each command line option group to the correct handler. > After addressing these two based off some experiments it looks like > breaking out storm is pretty straight forward and spark should be about the > same. I'm still looking at elastic search. Hadoop would require more and > I think less important for now. Much of the Hadoop dependency I see is 'services' for storing and retrieving, these could be abstracted out to a provider model. > I also realize there are other ways to break the modules apart and I'm > mostly discussing modularizing the responder package, however that's were > most of the dependencies lie so I think that's were we'll get the most > impact. +1, the responder and CLI. Regards, Tim > On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <[email protected]> > wrote: > >> +1 to start a sub-thread. I would suggest to start a shared Google Doc for >> dumping ideas and evolving a structure. >> >> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams < >> [email protected]> wrote: >> >>> Starting a new thread to discuss the Pirk submodule refactor (so that we >>> don't get too mixed up with the 'Next short term goal?' thread)... >>> >>> Darin - Thanks for jumping in on the last email (I think that we hit send >>> at exactly the same time :)). Can you describe what you have in mind for >>> the submodule refactor so that we can discuss? >>> >>> (No, there is not an umbrella JIRA for producing separate Responder jars >> - >>> please feel free to go ahead and add one) >>> >> >
