Reposting the Google doc to this thread for cohesion. https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_mMrRQyynQ-Q6MFbI/edit?usp=sharing
If there's no issues I'd like to start this, since it involves a lot of file moves (which are a pain to revise) my plan is to break it into a few modules at a time. That should make the reviews and testing easier as well. On Sep 17, 2016 8:54 AM, "Darin Johnson" <[email protected]> wrote: > Great > > Will have pirk-63 sometime this weekend, which will help. Then go ahead > with these suggestions as a base, I may come back with some thoughts about > the cli. I'd like for new responders not to modify pirk-core. There's a > few ways I've done this before, but need to decide which will be least > intrusive and easiest to maintain. > > Darin > > On Sep 15, 2016 6:17 PM, "Ellison Anne Williams" <[email protected]> > wrote: > >> On Thu, Sep 15, 2016 at 9:25 AM, Tim Ellison <[email protected]> >> wrote: >> >> > On 15/09/16 09:21, Darin Johnson wrote: >> > > So my goal for the submodule refactor is pretty straight forward, I >> > > basically want to separate the project into: pirk-core, pirk-hadoop, >> > > pirk-spark, and pirk-storm. I think separating pirk-core and >> pirk-hadoop >> > > is very ambitious at this point as there's a lot of dependencies we'd >> > need >> > > to resolve. >> > >> > I think it is quite do-able, but agree that it is more work than the >> > others. >> > >> > > pirk-storm and pirk-spark would be much more reasonable >> > > starts. I'd also recommend we do something about the elastic-search >> > > dependency, it seems more of an InputFormat option than part of >> > pirk-core. >> > > >> > > There's a few blockers to this: >> > > >> > > This first is PIRK-63, here the ResponderDriver was calling the >> Responder >> > > class of each specific framework. That fix is straight-forward, pass >> the >> > > class as an argument I've started that here: >> > > https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was >> expected >> > > earlier - but had a rebase issue - so didn't get around to completing >> a >> > few >> > > bits). It also allows at least at the rudimentary level to add new >> > > responders by putting jars on the classpath vs recompiling pirk. I'm >> > open >> > > to suggestions here - I think it's very likely ResponderLauncher isn't >> > > needed and instead run could be a static member of another class, >> however >> > > based off what was in ResponderDriver this seems to be the approach >> with >> > > the fewest issues - especially storm. >> > >> > Give a shout when you want somebody to take a look. >> > >> > > Another is how we're passing the command line options in ResponderCLI, >> > here >> > > we're defining framework specific elements to the Driver which are >> then >> > > passed to the underlying framework Driver/Topology/ToolRunner. This >> > > becomes more difficult to address cleanly so seems like a good place >> to >> > > start a discussion. I think this mechanism should be addressed >> though as >> > > putting options for every framework/inputformat everyone could want in >> > > untenable. >> > >> > I guess one option is structure the monolithic CLI around plug-ins, so >> > rather than today's >> > ResponderDriver <options for everything> ... >> > >> > it would become >> > ResponderDriver --pir embedSelector=true --storm option=value ... >> > >> > and so on; or more likely >> > ResponderDriver --pir optionsFile=pir.properties --storm >> > optionsFile=storm.properties ... >> > >> > and then the driver can delegate each command line option group to the >> > correct handler. >> > >> >> Agree with this approach - as the CLI already supports reading all of the >> properties from properties files (both local and in hdfs), it should be >> relatively straightforward to delegate the handling. >> >> >> > >> > > After addressing these two based off some experiments it looks like >> > > breaking out storm is pretty straight forward and spark should be >> about >> > the >> > > same. I'm still looking at elastic search. Hadoop would require more >> > and >> > > I think less important for now. >> > >> > Much of the Hadoop dependency I see is 'services' for storing and >> > retrieving, these could be abstracted out to a provider model. >> > >> >> Agreed. >> >> >> > >> > > I also realize there are other ways to break the modules apart and I'm >> > > mostly discussing modularizing the responder package, however that's >> were >> > > most of the dependencies lie so I think that's were we'll get the most >> > > impact. >> > >> > +1, the responder and CLI. >> > >> > Regards, >> > Tim >> > >> > >> > > On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi < >> [email protected]> >> > > wrote: >> > > >> > >> +1 to start a sub-thread. I would suggest to start a shared Google >> Doc >> > for >> > >> dumping ideas and evolving a structure. >> > >> >> > >> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams < >> > >> [email protected]> wrote: >> > >> >> > >>> Starting a new thread to discuss the Pirk submodule refactor (so >> that >> > we >> > >>> don't get too mixed up with the 'Next short term goal?' thread)... >> > >>> >> > >>> Darin - Thanks for jumping in on the last email (I think that we hit >> > send >> > >>> at exactly the same time :)). Can you describe what you have in mind >> > for >> > >>> the submodule refactor so that we can discuss? >> > >>> >> > >>> (No, there is not an umbrella JIRA for producing separate Responder >> > jars >> > >> - >> > >>> please feel free to go ahead and add one) >> > >>> >> > >> >> > > >> > >> >
