On 15/09/16 09:21, Darin Johnson wrote:
> So my goal for the submodule refactor is pretty straight forward, I
> basically want to separate the project into: pirk-core, pirk-hadoop,
> pirk-spark, and pirk-storm.  I think separating pirk-core and pirk-hadoop
> is very ambitious at this point as there's a lot of dependencies we'd need
> to resolve.

I think it is quite do-able, but agree that it is more work than the others.

> pirk-storm and pirk-spark would be much more reasonable
> starts.  I'd also recommend we do something about the elastic-search
> dependency, it seems more of an InputFormat option than part of pirk-core.
> 
> There's a few blockers to this:
> 
> This first is PIRK-63, here the ResponderDriver was calling the Responder
> class of each specific framework.  That fix is straight-forward, pass the
> class as an argument I've started that here:
> https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was expected
> earlier - but had a rebase issue - so didn't get around to completing a few
> bits).  It also allows at least at the rudimentary level to add new
> responders by putting jars on the classpath vs recompiling pirk.  I'm open
> to suggestions here - I think it's very likely ResponderLauncher isn't
> needed and instead run could be a static member of another class, however
> based off what was in ResponderDriver this seems to be the approach with
> the fewest issues - especially storm.

Give a shout when you want somebody to take a look.

> Another is how we're passing the command line options in ResponderCLI, here
> we're defining framework specific elements to the Driver which are then
> passed to the underlying framework Driver/Topology/ToolRunner.  This
> becomes more difficult to address cleanly so seems like a good place to
> start a discussion.  I think this mechanism should be addressed though as
> putting options for every framework/inputformat everyone could want in
> untenable.

I guess one option is structure the monolithic CLI around plug-ins, so
rather than today's
  ResponderDriver <options for everything> ...

it would become
  ResponderDriver --pir embedSelector=true --storm option=value ...

and so on; or more likely
  ResponderDriver --pir optionsFile=pir.properties --storm
optionsFile=storm.properties ...

and then the driver can delegate each command line option group to the
correct handler.

> After addressing these two based off some experiments it looks like
> breaking out storm is pretty straight forward and spark should be about the
> same.  I'm still looking at elastic search.  Hadoop would require more and
> I think less important for now.

Much of the Hadoop dependency I see is 'services' for storing and
retrieving, these could be abstracted out to a provider model.

> I also realize there are other ways to break the modules apart and I'm
> mostly discussing modularizing the responder package, however that's were
> most of the dependencies lie so I think that's were we'll get the most
> impact.

+1, the responder and CLI.

Regards,
Tim


> On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <[email protected]>
> wrote:
> 
>> +1 to start a sub-thread. I would suggest to start a shared Google Doc for
>> dumping ideas and evolving a structure.
>>
>> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams <
>> [email protected]> wrote:
>>
>>> Starting a new thread to discuss the Pirk submodule refactor (so that we
>>> don't get too mixed up with the 'Next short term goal?' thread)...
>>>
>>> Darin - Thanks for jumping in on the last email (I think that we hit send
>>> at exactly the same time :)). Can you describe what you have in mind for
>>> the submodule refactor so that we can discuss?
>>>
>>> (No, there is not an umbrella JIRA for producing separate Responder jars
>> -
>>> please feel free to go ahead and add one)
>>>
>>
> 

Reply via email to