Ellison Anne, Good to here you're in favor.
Yes the name of the class would be parsed from the command line or taken from the properties file. I plan to use the same mechanism as currently used to get the platform variable. I'm pro using spark-submit in general moving to hadoop jar removes spark standalone/mesos users. I'll try to knock out the properties in the next day or so. Cheers, Darin On Sep 11, 2016 11:50 AM, "Ellison Anne Williams" <[email protected]> wrote: Hi Darin, I think that generalizing the Responder launching in the ResponderDriver (and elsewhere) with a ResponderLauncher interface makes a lot of sense and is 'in the spirit' of some of the other generalities within the codebase (in the schemas, partitioners, base input format, etc). I am assuming that name of the specific ResponderLauncher implementation class would be passed as an argument (or parsed from the properties file) to the ResponderDriver via the same ResponderDriverCLI mechanisms. The ResponderDriver would then instantiate that class, launching the desired Responder. Is this what you had in mind? If so, the only decision point for us is whether or not Spark-based Responders should be run with spark-submit (i.e., calling the ResponderDriver with spark-submit - the way it's currently done) or if the implementations of ResponderLauncher should in turn call SparkLauncher (meaning that the ResponderDriver could be called with hadoop jar). The only considerations in forcing Spark-based Responders to use the SparkLauncher are (1) that it becomes a bit more tricky to launch with SparkLaucher as the 'spark-home' (the dir containing the spark-submit script) can be difficult to pick up correctly within some systems (we've specifically had trouble with AWS and GCP) and (2) all Spark related configs must be passed as args to the SparkLauncher. I'm not concerned about altering the API at this point as we are only on release 0.1.0 -- we need to stabilize the API before a 1.0.0 release, but we can change it in ways that make sense now to move closer to a stable API. I am in agreement to proceed with the PR. Thoughts? Thanks! Ellison Anne On Sat, Sep 10, 2016 at 9:34 PM, Darin Johnson <[email protected]> wrote: > Hey guys, > > I was looking into creating my own responder as a general exercise but as > the jar was getting pretty big I thought it might be useful to first create > a modular build as someone using hadoop not would want to push around storm > dependencies and vice versa. As I was scoping this I noticed in > ResponderDriver there is the following block: > > switch (platform) > > { > > case MAPREDUCE: ... > > case SPARK: ... > > case SPARKSTREAMING: ... > > case STORM: ... > > case STANDALONE:... > > } > > This essentially means that pirk must know about all platforms in order to > run. I think a better approach might be to create an interface > "ResponderLauncher" which the developer of a platform would overload, and > pass the overloaded classname on the command line or via configuration and > loaded at runtime via reflection (this is how hadoop allows different > schedulers). > > The would allow better extensibility to other platforms, especially for > users using proprietary or non-apache license compatible tools, along with > starting the process of a multi-module build. Then one could just put > additional jars in the classpath and run vs modify the pirk code to get > their platform included. > > I believe something like: > > public interface ResponderLauncher { > > public void run(ConfigOpts opts); > > public void run(); > > } > > would likely do. Here ConfigOpts is fictional and doesn't appear necessary, > but I thought I should offer some possibility for passing some command line > or other options - suggestions welcome. > > I think I could get this done and the [Hadoop,Spark, > SparkStreaming,Storm]ResponderLauncher classes rather quickly but as this > would be my first work on this project I thought it'd be good to solicit > opinions first. Especially as its API breaking and you look to be > attempting semantic versioning. > > If this works for everyone, I'd be willing to submit this as a PR and a > second modularizing the build (which will likely be preceded by another > email discussion). Though, I envision it creating pirk-core, pirk-hadoop, > pirk-spark, and pirk-storm artifacts which could then be deployed to > central. > > Cheers, > Darin >
