So my goal for the submodule refactor is pretty straight forward, I basically want to separate the project into: pirk-core, pirk-hadoop, pirk-spark, and pirk-storm. I think separating pirk-core and pirk-hadoop is very ambitious at this point as there's a lot of dependencies we'd need to resolve. pirk-storm and pirk-spark would be much more reasonable starts. I'd also recommend we do something about the elastic-search dependency, it seems more of an InputFormat option than part of pirk-core.
There's a few blockers to this: This first is PIRK-63, here the ResponderDriver was calling the Responder class of each specific framework. That fix is straight-forward, pass the class as an argument I've started that here: https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was expected earlier - but had a rebase issue - so didn't get around to completing a few bits). It also allows at least at the rudimentary level to add new responders by putting jars on the classpath vs recompiling pirk. I'm open to suggestions here - I think it's very likely ResponderLauncher isn't needed and instead run could be a static member of another class, however based off what was in ResponderDriver this seems to be the approach with the fewest issues - especially storm. Another is how we're passing the command line options in ResponderCLI, here we're defining framework specific elements to the Driver which are then passed to the underlying framework Driver/Topology/ToolRunner. This becomes more difficult to address cleanly so seems like a good place to start a discussion. I think this mechanism should be addressed though as putting options for every framework/inputformat everyone could want in untenable. After addressing these two based off some experiments it looks like breaking out storm is pretty straight forward and spark should be about the same. I'm still looking at elastic search. Hadoop would require more and I think less important for now. I also realize there are other ways to break the modules apart and I'm mostly discussing modularizing the responder package, however that's were most of the dependencies lie so I think that's were we'll get the most impact. Darin On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <[email protected]> wrote: > +1 to start a sub-thread. I would suggest to start a shared Google Doc for > dumping ideas and evolving a structure. > > On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams < > [email protected]> wrote: > > > Starting a new thread to discuss the Pirk submodule refactor (so that we > > don't get too mixed up with the 'Next short term goal?' thread)... > > > > Darin - Thanks for jumping in on the last email (I think that we hit send > > at exactly the same time :)). Can you describe what you have in mind for > > the submodule refactor so that we can discuss? > > > > (No, there is not an umbrella JIRA for producing separate Responder jars > - > > please feel free to go ahead and add one) > > >
