So my goal for the submodule refactor is pretty straight forward, I
basically want to separate the project into: pirk-core, pirk-hadoop,
pirk-spark, and pirk-storm.  I think separating pirk-core and pirk-hadoop
is very ambitious at this point as there's a lot of dependencies we'd need
to resolve.  pirk-storm and pirk-spark would be much more reasonable
starts.  I'd also recommend we do something about the elastic-search
dependency, it seems more of an InputFormat option than part of pirk-core.

There's a few blockers to this:

This first is PIRK-63, here the ResponderDriver was calling the Responder
class of each specific framework.  That fix is straight-forward, pass the
class as an argument I've started that here:
https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was expected
earlier - but had a rebase issue - so didn't get around to completing a few
bits).  It also allows at least at the rudimentary level to add new
responders by putting jars on the classpath vs recompiling pirk.  I'm open
to suggestions here - I think it's very likely ResponderLauncher isn't
needed and instead run could be a static member of another class, however
based off what was in ResponderDriver this seems to be the approach with
the fewest issues - especially storm.

Another is how we're passing the command line options in ResponderCLI, here
we're defining framework specific elements to the Driver which are then
passed to the underlying framework Driver/Topology/ToolRunner.  This
becomes more difficult to address cleanly so seems like a good place to
start a discussion.  I think this mechanism should be addressed though as
putting options for every framework/inputformat everyone could want in
untenable.

After addressing these two based off some experiments it looks like
breaking out storm is pretty straight forward and spark should be about the
same.  I'm still looking at elastic search.  Hadoop would require more and
I think less important for now.

I also realize there are other ways to break the modules apart and I'm
mostly discussing modularizing the responder package, however that's were
most of the dependencies lie so I think that's were we'll get the most
impact.

Darin

On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <suneel.mar...@gmail.com>
wrote:

> +1 to start a sub-thread. I would suggest to start a shared Google Doc for
> dumping ideas and evolving a structure.
>
> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams <
> eawilli...@apache.org> wrote:
>
> > Starting a new thread to discuss the Pirk submodule refactor (so that we
> > don't get too mixed up with the 'Next short term goal?' thread)...
> >
> > Darin - Thanks for jumping in on the last email (I think that we hit send
> > at exactly the same time :)). Can you describe what you have in mind for
> > the submodule refactor so that we can discuss?
> >
> > (No, there is not an umbrella JIRA for producing separate Responder jars
> -
> > please feel free to go ahead and add one)
> >
>

Reply via email to