Re: Making ResponderDriver more general

Darin Johnson Sun, 11 Sep 2016 18:24:52 -0700

Ellison Anne,

Good to here you're in favor.

Yes the name of the class would be parsed from the command line or taken
from the properties file.  I plan to use the same mechanism as currently
used to get the platform variable.

I'm pro using spark-submit in general moving to hadoop jar removes spark
standalone/mesos users.

I'll try to knock out the properties in the next day or so.

Cheers,
Darin

On Sep 11, 2016 11:50 AM, "Ellison Anne Williams" <[email protected]>
wrote:

Hi Darin,

I think that generalizing the Responder launching in the ResponderDriver
(and elsewhere) with a ResponderLauncher interface makes a lot of sense and
is 'in the spirit' of some of the other generalities within the codebase
(in the schemas, partitioners, base input format, etc).

I am assuming that name of the specific ResponderLauncher implementation
class would be passed as an argument (or parsed from the properties file)
to the ResponderDriver via the same ResponderDriverCLI mechanisms. The
ResponderDriver would then instantiate that class, launching the desired
Responder. Is this what you had in mind?

If so, the only decision point for us is whether or not Spark-based
Responders should be run with spark-submit (i.e., calling the
ResponderDriver with spark-submit - the way it's currently done) or if the
implementations of ResponderLauncher should in turn call SparkLauncher
(meaning that the ResponderDriver could be called with hadoop jar). The
only considerations in forcing Spark-based Responders to use the
SparkLauncher are (1) that it becomes a bit more tricky to launch with
SparkLaucher as the 'spark-home' (the dir containing the spark-submit
script) can be difficult to pick up correctly within some systems (we've
specifically had trouble with AWS and GCP) and (2) all Spark related
configs must be passed as args to the SparkLauncher.

I'm not concerned about altering the API at this point as we are only on
release 0.1.0 -- we need to stabilize the API before a 1.0.0 release, but
we can change it in ways that make sense now to move closer to a stable API.

I am in agreement to proceed with the PR.

Thoughts?

Thanks!

Ellison Anne

On Sat, Sep 10, 2016 at 9:34 PM, Darin Johnson <[email protected]>
wrote:

> Hey guys,
>
> I was looking into creating my own responder as a general exercise but as
> the jar was getting pretty big I thought it might be useful to first
create
> a modular build as someone using hadoop not would want to push around
storm
> dependencies and vice versa.  As I was scoping this I noticed in
> ResponderDriver there is the following block:
>
> switch (platform)
>
>     {
>
>       case MAPREDUCE: ...
>
>       case SPARK: ...
>
>       case SPARKSTREAMING: ...
>
>       case STORM: ...
>
>       case STANDALONE:...
>
>     }
>
> This essentially means that pirk must know about all platforms in order to
> run.  I think a better approach might be to create an interface
> "ResponderLauncher" which the developer of a platform would overload, and
> pass the overloaded classname on the command line or via configuration and
> loaded at runtime via reflection (this is how hadoop allows different
> schedulers).
>
> The would allow better extensibility to other platforms, especially for
> users using proprietary or non-apache license compatible tools, along with
> starting the process of a multi-module build.  Then one could just put
> additional jars in the classpath and run vs modify the pirk code to get
> their platform included.
>
> I believe something like:
>
> public interface ResponderLauncher {
>
> public void run(ConfigOpts opts);
>
> public void run();
>
> }
>
> would likely do. Here ConfigOpts is fictional and doesn't appear
necessary,
> but I thought I should offer some possibility for passing some command
line
> or other options - suggestions welcome.
>
> I think I could get this done and the [Hadoop,Spark,
> SparkStreaming,Storm]ResponderLauncher classes rather quickly but as this
> would be my first work on this project I thought it'd be good to solicit
> opinions first. Especially as its API breaking and you look to be
> attempting semantic versioning.
>
> If this works for everyone, I'd be willing to submit this as a PR and a
> second modularizing the build (which will likely be preceded by another
> email discussion). Though, I envision it creating pirk-core, pirk-hadoop,
> pirk-spark, and pirk-storm artifacts which could then be deployed to
> central.
>
> Cheers,
> Darin
>

Re: Making ResponderDriver more general

Reply via email to