Re: Pirk Submodule Refactor

Darin Johnson Fri, 23 Sep 2016 18:16:51 -0700

Reposting the Google doc to this thread for cohesion.
https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_mMrRQyynQ-Q6MFbI/edit?usp=sharing


If there's no issues I'd like to start this, since it involves a lot of
file moves (which are a pain to revise) my plan is to break it into a few
modules at a time.  That should make the reviews and testing easier as well.

On Sep 17, 2016 8:54 AM, "Darin Johnson" <[email protected]> wrote:

> Great
>
> Will have pirk-63 sometime this weekend, which will help.  Then go ahead
> with these suggestions as a base, I may come back with some thoughts about
> the cli.  I'd like for new responders not to modify pirk-core.  There's a
> few ways I've done this before, but need to decide which will be least
> intrusive and easiest to maintain.
>
> Darin
>
> On Sep 15, 2016 6:17 PM, "Ellison Anne Williams" <[email protected]>
> wrote:
>
>> On Thu, Sep 15, 2016 at 9:25 AM, Tim Ellison <[email protected]>
>> wrote:
>>
>> > On 15/09/16 09:21, Darin Johnson wrote:
>> > > So my goal for the submodule refactor is pretty straight forward, I
>> > > basically want to separate the project into: pirk-core, pirk-hadoop,
>> > > pirk-spark, and pirk-storm.  I think separating pirk-core and
>> pirk-hadoop
>> > > is very ambitious at this point as there's a lot of dependencies we'd
>> > need
>> > > to resolve.
>> >
>> > I think it is quite do-able, but agree that it is more work than the
>> > others.
>> >
>> > > pirk-storm and pirk-spark would be much more reasonable
>> > > starts.  I'd also recommend we do something about the elastic-search
>> > > dependency, it seems more of an InputFormat option than part of
>> > pirk-core.
>> > >
>> > > There's a few blockers to this:
>> > >
>> > > This first is PIRK-63, here the ResponderDriver was calling the
>> Responder
>> > > class of each specific framework.  That fix is straight-forward, pass
>> the
>> > > class as an argument I've started that here:
>> > > https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was
>> expected
>> > > earlier - but had a rebase issue - so didn't get around to completing
>> a
>> > few
>> > > bits).  It also allows at least at the rudimentary level to add new
>> > > responders by putting jars on the classpath vs recompiling pirk.  I'm
>> > open
>> > > to suggestions here - I think it's very likely ResponderLauncher isn't
>> > > needed and instead run could be a static member of another class,
>> however
>> > > based off what was in ResponderDriver this seems to be the approach
>> with
>> > > the fewest issues - especially storm.
>> >
>> > Give a shout when you want somebody to take a look.
>> >
>> > > Another is how we're passing the command line options in ResponderCLI,
>> > here
>> > > we're defining framework specific elements to the Driver which are
>> then
>> > > passed to the underlying framework Driver/Topology/ToolRunner.  This
>> > > becomes more difficult to address cleanly so seems like a good place
>> to
>> > > start a discussion.  I think this mechanism should be addressed
>> though as
>> > > putting options for every framework/inputformat everyone could want in
>> > > untenable.
>> >
>> > I guess one option is structure the monolithic CLI around plug-ins, so
>> > rather than today's
>> >   ResponderDriver <options for everything> ...
>> >
>> > it would become
>> >   ResponderDriver --pir embedSelector=true --storm option=value ...
>> >
>> > and so on; or more likely
>> >   ResponderDriver --pir optionsFile=pir.properties --storm
>> > optionsFile=storm.properties ...
>> >
>> > and then the driver can delegate each command line option group to the
>> > correct handler.
>> >
>>
>> Agree with this approach - as the CLI already supports reading all of the
>> properties from properties files (both local and in hdfs), it should be
>> relatively straightforward to delegate the handling.
>>
>>
>> >
>> > > After addressing these two based off some experiments it looks like
>> > > breaking out storm is pretty straight forward and spark should be
>> about
>> > the
>> > > same.  I'm still looking at elastic search.  Hadoop would require more
>> > and
>> > > I think less important for now.
>> >
>> > Much of the Hadoop dependency I see is 'services' for storing and
>> > retrieving, these could be abstracted out to a provider model.
>> >
>>
>> Agreed.
>>
>>
>> >
>> > > I also realize there are other ways to break the modules apart and I'm
>> > > mostly discussing modularizing the responder package, however that's
>> were
>> > > most of the dependencies lie so I think that's were we'll get the most
>> > > impact.
>> >
>> > +1, the responder and CLI.
>> >
>> > Regards,
>> > Tim
>> >
>> >
>> > > On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <
>> [email protected]>
>> > > wrote:
>> > >
>> > >> +1 to start a sub-thread. I would suggest to start a shared Google
>> Doc
>> > for
>> > >> dumping ideas and evolving a structure.
>> > >>
>> > >> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams <
>> > >> [email protected]> wrote:
>> > >>
>> > >>> Starting a new thread to discuss the Pirk submodule refactor (so
>> that
>> > we
>> > >>> don't get too mixed up with the 'Next short term goal?' thread)...
>> > >>>
>> > >>> Darin - Thanks for jumping in on the last email (I think that we hit
>> > send
>> > >>> at exactly the same time :)). Can you describe what you have in mind
>> > for
>> > >>> the submodule refactor so that we can discuss?
>> > >>>
>> > >>> (No, there is not an umbrella JIRA for producing separate Responder
>> > jars
>> > >> -
>> > >>> please feel free to go ahead and add one)
>> > >>>
>> > >>
>> > >
>> >
>>
>

Re: Pirk Submodule Refactor

Reply via email to