On Thu, Sep 15, 2016 at 9:25 AM, Tim Ellison <t.p.elli...@gmail.com> wrote:
> On 15/09/16 09:21, Darin Johnson wrote: > > So my goal for the submodule refactor is pretty straight forward, I > > basically want to separate the project into: pirk-core, pirk-hadoop, > > pirk-spark, and pirk-storm. I think separating pirk-core and pirk-hadoop > > is very ambitious at this point as there's a lot of dependencies we'd > need > > to resolve. > > I think it is quite do-able, but agree that it is more work than the > others. > > > pirk-storm and pirk-spark would be much more reasonable > > starts. I'd also recommend we do something about the elastic-search > > dependency, it seems more of an InputFormat option than part of > pirk-core. > > > > There's a few blockers to this: > > > > This first is PIRK-63, here the ResponderDriver was calling the Responder > > class of each specific framework. That fix is straight-forward, pass the > > class as an argument I've started that here: > > https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was expected > > earlier - but had a rebase issue - so didn't get around to completing a > few > > bits). It also allows at least at the rudimentary level to add new > > responders by putting jars on the classpath vs recompiling pirk. I'm > open > > to suggestions here - I think it's very likely ResponderLauncher isn't > > needed and instead run could be a static member of another class, however > > based off what was in ResponderDriver this seems to be the approach with > > the fewest issues - especially storm. > > Give a shout when you want somebody to take a look. > > > Another is how we're passing the command line options in ResponderCLI, > here > > we're defining framework specific elements to the Driver which are then > > passed to the underlying framework Driver/Topology/ToolRunner. This > > becomes more difficult to address cleanly so seems like a good place to > > start a discussion. I think this mechanism should be addressed though as > > putting options for every framework/inputformat everyone could want in > > untenable. > > I guess one option is structure the monolithic CLI around plug-ins, so > rather than today's > ResponderDriver <options for everything> ... > > it would become > ResponderDriver --pir embedSelector=true --storm option=value ... > > and so on; or more likely > ResponderDriver --pir optionsFile=pir.properties --storm > optionsFile=storm.properties ... > > and then the driver can delegate each command line option group to the > correct handler. > Agree with this approach - as the CLI already supports reading all of the properties from properties files (both local and in hdfs), it should be relatively straightforward to delegate the handling. > > > After addressing these two based off some experiments it looks like > > breaking out storm is pretty straight forward and spark should be about > the > > same. I'm still looking at elastic search. Hadoop would require more > and > > I think less important for now. > > Much of the Hadoop dependency I see is 'services' for storing and > retrieving, these could be abstracted out to a provider model. > Agreed. > > > I also realize there are other ways to break the modules apart and I'm > > mostly discussing modularizing the responder package, however that's were > > most of the dependencies lie so I think that's were we'll get the most > > impact. > > +1, the responder and CLI. > > Regards, > Tim > > > > On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <suneel.mar...@gmail.com> > > wrote: > > > >> +1 to start a sub-thread. I would suggest to start a shared Google Doc > for > >> dumping ideas and evolving a structure. > >> > >> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams < > >> eawilli...@apache.org> wrote: > >> > >>> Starting a new thread to discuss the Pirk submodule refactor (so that > we > >>> don't get too mixed up with the 'Next short term goal?' thread)... > >>> > >>> Darin - Thanks for jumping in on the last email (I think that we hit > send > >>> at exactly the same time :)). Can you describe what you have in mind > for > >>> the submodule refactor so that we can discuss? > >>> > >>> (No, there is not an umbrella JIRA for producing separate Responder > jars > >> - > >>> please feel free to go ahead and add one) > >>> > >> > > >