Great will write up the doc link here, finish pirk63 then start this. On Sep 19, 2016 5:34 PM, "Suneel Marthi" <suneel.mar...@gmail.com> wrote:
> +100 > > On Mon, Sep 19, 2016 at 11:24 PM, Ellison Anne Williams < > eawilli...@apache.org> wrote: > > > Yes, ES is just an inputformat (like HDFS, Kafka, etc) - we don't need a > > separate submodule. > > > > Aside from pirk-core, it seems that we would want to break the responder > > implementations out into submodules. This would leave us with something > > along the lines of the following (at this point): > > > > pirk-core (encryption, core responder incl. standalone, core querier, > > query, inputformat, serialization, utils) > > pirk-storm > > pirk-mapreduce > > pirk-spark > > pirk-benchmark > > pirk-distributed-test > > > > Once we add other responder implementations, we can add them as > submodules > > - i.e. for Flink, we would have pirk-flink; for Beam, pirk-beam, etc. > > > > We could break 'pirk-core' down further... > > > > On Mon, Sep 19, 2016 at 5:10 PM, Suneel Marthi <suneel.mar...@gmail.com> > > wrote: > > > > > Here's an example from the Flink project for how they go about new > > features > > > or system breaking API changes, we could start a similar process. The > > Flink > > > guys call these FLIP (Flink Improvement Proposal) and Kafka community > > > similarly has something called KLIP. > > > > > > We could start a PLIP (??? :-) ) > > > > > > https://cwiki.apache.org/confluence/pages/viewpage. > > action?pageId=65870673 > > > > > > > > > On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi < > suneel.mar...@gmail.com > > > > > > wrote: > > > > > > > A shared Google doc would be more convenient than a bunch of Jiras. > Its > > > > easier to comment and add notes that way. > > > > > > > > > > > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson < > > dbjohnson1...@gmail.com > > > > > > > > wrote: > > > > > > > >> Suneel, I'll try to put a couple jiras on it tonight with my > thoughts. > > > >> Based off my pirk-63 I was able to pull spark and storm out with no > > > >> issues. I was planning to pull them out, then tackling elastic > > search, > > > >> then hadoop as it's a little entrenched. This should keep most PRs > to > > > >> manageable chunks. I think once that's done addressing the configs > > will > > > >> make more sense. > > > >> > > > >> I'm open to suggestions. But the hope would be: > > > >> Pirk-parent > > > >> Pirk-core > > > >> Pirk-hadoop > > > >> Pirk-storm > > > >> Pirk-parent > > > >> > > > >> Pirk-es is a little weird as it's really just an inputformat, seems > > like > > > >> there's a more general solution here than creating submodules for > > every > > > >> inputformat. > > > >> > > > >> Darin > > > >> > > > >> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <smar...@apache.org> > wrote: > > > >> > > > >> > > > > >> > > > >> > Refactor is definitely a first priority. Is there a > design/proposal > > > >> draft > > > >> > that we could comment on about how to go about refactoring the > code. > > > I > > > >> > have been trying to keep up with the emails but definitely would > > have > > > >> > missed some. > > > >> > > > > >> > > > > >> > > > > >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams < > > > >> > eawilli...@apache.org <eawilli...@apache.org>> wrote: > > > >> > > > > >> > > Agree - let's leave the config/CLI the way it is for now and > > tackle > > > >> that as > > > >> > > a subsequent design discussion and PR. > > > >> > > > > > >> > > Also, I think that we should leave the ResponderDriver and the > > > >> > > ResponderProps alone for this PR and push to a subsequent PR > (once > > > we > > > >> > > decide if and how we would like to delegate each). > > > >> > > > > > >> > > I vote to remove the 'platform' option and the backwards > > > compatibility > > > >> in > > > >> > > this PR and proceed with having a ResponderLauncher interface > and > > > >> forcing > > > >> > > its implementation by the ResponderDriver. > > > >> > > > > > >> > > And, I am not so concerned with having one fat jar vs. multiple > > jars > > > >> right > > > >> > > now - to me, at this point, it's a 'nice to have' and not a > 'must > > > >> have' > > > >> for > > > >> > > Pirk functionality. We do need to break out Pirk into more > clearly > > > >> defined > > > >> > > submodules (which is in progress) - via this re-factor, I think > > that > > > >> we > > > >> > > will gain some ability to generate multiple jars which is nice. > > > >> > > > > > >> > > > > > >> > > > > > >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison < > > > t.p.elli...@gmail.com> > > > >> > > wrote: > > > >> > > > > > >> > > > On 19/09/16 15:46, Darin Johnson wrote: > > > >> > > > > Hey guys, > > > >> > > > > > > > >> > > > > Thanks for looking at the PR, I apologize if it offended > > > anyone's > > > >> > > eyes:). > > > >> > > > > > > > >> > > > > I'm glad it generated some discussion about the > configuration. > > > I > > > >> > > didn't > > > >> > > > > really like where things were heading with the config. > > However, > > > >> didn't > > > >> > > > > want to create to much scope creep. > > > >> > > > > > > > >> > > > > I think any hierarchical config (TypeSafe or yaml) would > make > > > >> things > > > >> > > much > > > >> > > > > more maintainable, the plugin could simply grab the > > appropriate > > > >> part of > > > >> > > > the > > > >> > > > > config and handle accordingly. I'd also cut down the number > > of > > > >> command > > > >> > > > > line options to only those that change between runs often > > (like > > > >> > > > > input/output) > > > >> > > > > > > > >> > > > >> One option is to make Pirk pluggable, so that a Pirk > > > installation > > > >> > > could > > > >> > > > >> use one or more of these in an extensible fashion by adding > > JAR > > > >> files. > > > >> > > > >> That would still require selecting one by command-line > > > argument. > > > >> > > > > > > > >> > > > > An argument for this approach is for lambda architecture > > > >> approaches > > > >> > > (say > > > >> > > > > spark/spark-streaming) were the contents of the jars would > be > > so > > > >> > > similar > > > >> > > > it > > > >> > > > > seems like to much trouble to create separate jars. > > > >> > > > > > > > >> > > > > Happy to continue working on this given some direction on > > where > > > >> you'd > > > >> > > > like > > > >> > > > > it to go. Also, it's a bit of a blocker to refactoring the > > > build > > > >> into > > > >> > > > > submodules. > > > >> > > > > > > >> > > > FWIW my 2c is to not try and fix all the problems in one go, > and > > > >> rather > > > >> > > > take a compromise on the configurations while you tease apart > > the > > > >> > > > submodules in to separate source code trees, poms, etc; then > > come > > > >> back > > > >> > > > and fix the runtime configs. > > > >> > > > > > > >> > > > Once the submodules are in place it will open up more work for > > > >> release > > > >> > > > engineering and tinkering that can be done in parallel with > the > > > >> config > > > >> > > > polishing. > > > >> > > > > > > >> > > > Just a thought. > > > >> > > > Tim > > > >> > > > > > > >> > > > > > > >> > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison < > > > >> t.p.elli...@gmail.com> > > > >> > > > wrote: > > > >> > > > > > > > >> > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote: > > > >> > > > >>> It seems that it's the same idea as the ResponderLauncher > > with > > > >> the > > > >> > > > >> service > > > >> > > > >>> component added to maintain something akin to the > > 'platform'. > > > I > > > >> would > > > >> > > > >>> prefer that we just did away with the platform notion > > > altogether > > > >> and > > > >> > > > make > > > >> > > > >>> the ResponderDriver 'dumb'. We get around needing a > > > >> platform-aware > > > >> > > > >> service > > > >> > > > >>> by requiring the ResponderLauncher implementation to be > > passed > > > >> as > > > >> a > > > >> > > CLI > > > >> > > > >> to > > > >> > > > >>> the ResponderDriver. > > > >> > > > >> > > > >> > > > >> Let me check I understand what you are saying here. > > > >> > > > >> > > > >> > > > >> At the moment, there is a monolithic Pirk that hard codes > how > > > to > > > >> > > respond > > > >> > > > >> using lots of different backends (mapreduce, spark, > > > >> sparkstreaming, > > > >> > > > >> storm , standalone), and that is selected by command-line > > > >> argument. > > > >> > > > >> > > > >> > > > >> One option is to make Pirk pluggable, so that a Pirk > > > installation > > > >> > > could > > > >> > > > >> use one or more of these in an extensible fashion by adding > > JAR > > > >> files. > > > >> > > > >> That would still require selecting one by command-line > > > argument. > > > >> > > > >> > > > >> > > > >> A second option is to simply pass in the required backend > JAR > > > to > > > >> > > select > > > >> > > > >> the particular implementation you choose, as a specific > Pirk > > > >> > > > >> installation doesn't need to use multiple backends > > > >> simultaneously. > > > >> > > > >> > > > >> > > > >> ...and you are leaning towards the second option. Do I > have > > > that > > > >> > > > correct? > > > >> > > > >> > > > >> > > > >> Regards, > > > >> > > > >> Tim > > > >> > > > >> > > > >> > > > >>> Am I missing something? Is there a good reason to provide > a > > > >> service > > > >> > > by > > > >> > > > >>> which platforms are registered? I'm open... > > > >> > > > >>> > > > >> > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison < > > > >> t.p.elli...@gmail.com> > > > >> > > > >> wrote: > > > >> > > > >>> > > > >> > > > >>>> How about an approach like this? > > > >> > > > >>>> https://github.com/tellison/ > incubator-pirk/tree/pirk-63 > > > >> > > > >>>> > > > >> > > > >>>> The "on-ramp" is the driver [1], which calls upon the > > service > > > >> to > > > >> > > find > > > >> > > > a > > > >> > > > >>>> plug-in [2] that claims to implement the required > platform > > > >> > > responder, > > > >> > > > >>>> e.g. [3]. > > > >> > > > >>>> > > > >> > > > >>>> The list of plug-ins is given in the provider's JAR file, > > so > > > >> the > > > >> > > ones > > > >> > > > we > > > >> > > > >>>> provide in Pirk are listed together [4], but if you split > > > these > > > >> into > > > >> > > > >>>> modules, or somebody brings their own JAR alongside, > these > > > >> would > > > >> be > > > >> > > > >>>> listed in each JAR's services/ directory. > > > >> > > > >>>> > > > >> > > > >>>> [1] > > > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/ > > > >> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/ > > > >> > > > ResponderDriver.java > > > >> > > > >>>> [2] > > > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/ > > > >> > > > >>>> src/main/java/org/apache/pirk/ > > responder/spi/ResponderPlugin. > > > >> java > > > >> > > > >>>> [3] > > > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/ > > > >> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/ > > > >> > > > >>>> StormResponder.java > > > >> > > > >>>> [4] > > > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/ > > > >> > > > >>>> src/main/services/org.apache.responder.spi.Responder > > > >> > > > >>>> > > > >> > > > >>>> I'm not even going to dignify this with a WIP PR, it is > far > > > >> from > > > >> > > > ready, > > > >> > > > >>>> so proceed with caution. There is hopefully enough there > > to > > > >> show > > > >> > > the > > > >> > > > >>>> approach, and if it is worth continuing I'm happy to do > so. > > > >> > > > >>>> > > > >> > > > >>>> Regards, > > > >> > > > >>>> Tim > > > >> > > > >>>> > > > >> > > > >>>> > > > >> > > > >>> > > > >> > > > >> > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > >