On 21/09/16 03:22, Ellison Anne Williams wrote: > I am in favor of breaking out pirk-core as specified so that our initial > submodule structure would be as follows: > > pirk-core (encryption,query, inputformat, serialization, utils) > > pirk-responder (core responder incl. standalone) > > pirk-querier > > pirk-storm > > pirk-mapreduce > > pirk-spark > > pirk-benchmark > > pirk-distributed-test
Yes, I certainly wouldn't split it up any more than this yet. > One thing to note is that under this breakdown, pirk-core would not include > the Elasticsearch dependency (es-hadoop). The only submodules that would > have the es-hadoop dependency (those which need it) currently are > pirk-mapreduce, pirk-spark, and pirk-distributed-test. > > > I believe that we agreed (somewhere :)) in this thread to go ahead and > remove the platform 'backwards compatibility' for PIRK-63. Please holler if > this is not correct. I agree. While it is trivial to maintain that compatibility, it feels like we still are in an era where we should use the freedom to drop it. Regards, Tim > On Tue, Sep 20, 2016 at 9:40 PM, Darin Johnson <[email protected]> > wrote: > >> Suneel, a google doc as promised, only a day late (sorry - sick kid). >> >> https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_ >> mMrRQyynQ-Q6MFbI/edit?usp=sharing >> >> I was planning on working on this, but I'm going to take a day or two to >> let others comment. >> >> Darin >> >> On Mon, Sep 19, 2016 at 5:07 PM, Suneel Marthi <[email protected]> >> wrote: >> >>> A shared Google doc would be more convenient than a bunch of Jiras. Its >>> easier to comment and add notes that way. >>> >>> >>> On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <[email protected] >>> >>> wrote: >>> >>>> Suneel, I'll try to put a couple jiras on it tonight with my thoughts. >>>> Based off my pirk-63 I was able to pull spark and storm out with no >>>> issues. I was planning to pull them out, then tackling elastic search, >>>> then hadoop as it's a little entrenched. This should keep most PRs to >>>> manageable chunks. I think once that's done addressing the configs will >>>> make more sense. >>>> >>>> I'm open to suggestions. But the hope would be: >>>> Pirk-parent >>>> Pirk-core >>>> Pirk-hadoop >>>> Pirk-storm >>>> Pirk-parent >>>> >>>> Pirk-es is a little weird as it's really just an inputformat, seems >> like >>>> there's a more general solution here than creating submodules for every >>>> inputformat. >>>> >>>> Darin >>>> >>>> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <[email protected]> wrote: >>>> >>>>> >>>> >>>>> Refactor is definitely a first priority. Is there a design/proposal >>>> draft >>>>> that we could comment on about how to go about refactoring the >> code. I >>>>> have been trying to keep up with the emails but definitely would have >>>>> missed some. >>>>> >>>>> >>>>> >>>>> On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams < >>>>> [email protected] <[email protected]>> wrote: >>>>> >>>>>> Agree - let's leave the config/CLI the way it is for now and tackle >>>> that as >>>>>> a subsequent design discussion and PR. >>>>>> >>>>>> Also, I think that we should leave the ResponderDriver and the >>>>>> ResponderProps alone for this PR and push to a subsequent PR (once >> we >>>>>> decide if and how we would like to delegate each). >>>>>> >>>>>> I vote to remove the 'platform' option and the backwards >>> compatibility >>>> in >>>>>> this PR and proceed with having a ResponderLauncher interface and >>>> forcing >>>>>> its implementation by the ResponderDriver. >>>>>> >>>>>> And, I am not so concerned with having one fat jar vs. multiple >> jars >>>> right >>>>>> now - to me, at this point, it's a 'nice to have' and not a 'must >>> have' >>>> for >>>>>> Pirk functionality. We do need to break out Pirk into more clearly >>>> defined >>>>>> submodules (which is in progress) - via this re-factor, I think >> that >>> we >>>>>> will gain some ability to generate multiple jars which is nice. >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison < >> [email protected] >>>> >>>>>> wrote: >>>>>> >>>>>>> On 19/09/16 15:46, Darin Johnson wrote: >>>>>>>> Hey guys, >>>>>>>> >>>>>>>> Thanks for looking at the PR, I apologize if it offended >> anyone's >>>>>> eyes:). >>>>>>>> >>>>>>>> I'm glad it generated some discussion about the >> configuration. I >>>>>> didn't >>>>>>>> really like where things were heading with the config. >> However, >>>> didn't >>>>>>>> want to create to much scope creep. >>>>>>>> >>>>>>>> I think any hierarchical config (TypeSafe or yaml) would make >>>> things >>>>>> much >>>>>>>> more maintainable, the plugin could simply grab the appropriate >>>> part of >>>>>>> the >>>>>>>> config and handle accordingly. I'd also cut down the number of >>>> command >>>>>>>> line options to only those that change between runs often (like >>>>>>>> input/output) >>>>>>>> >>>>>>>>> One option is to make Pirk pluggable, so that a Pirk >>> installation >>>>>> could >>>>>>>>> use one or more of these in an extensible fashion by adding >> JAR >>>> files. >>>>>>>>> That would still require selecting one by command-line >> argument. >>>>>>>> >>>>>>>> An argument for this approach is for lambda architecture >>> approaches >>>>>> (say >>>>>>>> spark/spark-streaming) were the contents of the jars would be >> so >>>>>> similar >>>>>>> it >>>>>>>> seems like to much trouble to create separate jars. >>>>>>>> >>>>>>>> Happy to continue working on this given some direction on where >>>> you'd >>>>>>> like >>>>>>>> it to go. Also, it's a bit of a blocker to refactoring the >> build >>>> into >>>>>>>> submodules. >>>>>>> >>>>>>> FWIW my 2c is to not try and fix all the problems in one go, and >>>> rather >>>>>>> take a compromise on the configurations while you tease apart the >>>>>>> submodules in to separate source code trees, poms, etc; then come >>>> back >>>>>>> and fix the runtime configs. >>>>>>> >>>>>>> Once the submodules are in place it will open up more work for >>>> release >>>>>>> engineering and tinkering that can be done in parallel with the >>>> config >>>>>>> polishing. >>>>>>> >>>>>>> Just a thought. >>>>>>> Tim >>>>>>> >>>>>>> >>>>>>>> On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison < >>>> [email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>>> On 19/09/16 13:40, Ellison Anne Williams wrote: >>>>>>>>>> It seems that it's the same idea as the ResponderLauncher >> with >>>> the >>>>>>>>> service >>>>>>>>>> component added to maintain something akin to the >> 'platform'. I >>>> would >>>>>>>>>> prefer that we just did away with the platform notion >>> altogether >>>> and >>>>>>> make >>>>>>>>>> the ResponderDriver 'dumb'. We get around needing a >>>> platform-aware >>>>>>>>> service >>>>>>>>>> by requiring the ResponderLauncher implementation to be >> passed >>> as >>>> a >>>>>> CLI >>>>>>>>> to >>>>>>>>>> the ResponderDriver. >>>>>>>>> >>>>>>>>> Let me check I understand what you are saying here. >>>>>>>>> >>>>>>>>> At the moment, there is a monolithic Pirk that hard codes how >> to >>>>>> respond >>>>>>>>> using lots of different backends (mapreduce, spark, >>>> sparkstreaming, >>>>>>>>> storm , standalone), and that is selected by command-line >>>> argument. >>>>>>>>> >>>>>>>>> One option is to make Pirk pluggable, so that a Pirk >>> installation >>>>>> could >>>>>>>>> use one or more of these in an extensible fashion by adding >> JAR >>>> files. >>>>>>>>> That would still require selecting one by command-line >> argument. >>>>>>>>> >>>>>>>>> A second option is to simply pass in the required backend JAR >> to >>>>>> select >>>>>>>>> the particular implementation you choose, as a specific Pirk >>>>>>>>> installation doesn't need to use multiple backends >>> simultaneously. >>>>>>>>> >>>>>>>>> ...and you are leaning towards the second option. Do I have >>> that >>>>>>> correct? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Tim >>>>>>>>> >>>>>>>>>> Am I missing something? Is there a good reason to provide a >>>> service >>>>>> by >>>>>>>>>> which platforms are registered? I'm open... >>>>>>>>>> >>>>>>>>>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison < >>>> [email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> How about an approach like this? >>>>>>>>>>> https://github.com/tellison/incubator-pirk/tree/pirk-63 >>>>>>>>>>> >>>>>>>>>>> The "on-ramp" is the driver [1], which calls upon the >> service >>> to >>>>>> find >>>>>>> a >>>>>>>>>>> plug-in [2] that claims to implement the required platform >>>>>> responder, >>>>>>>>>>> e.g. [3]. >>>>>>>>>>> >>>>>>>>>>> The list of plug-ins is given in the provider's JAR file, so >>> the >>>>>> ones >>>>>>> we >>>>>>>>>>> provide in Pirk are listed together [4], but if you split >>> these >>>> into >>>>>>>>>>> modules, or somebody brings their own JAR alongside, these >>> would >>>> be >>>>>>>>>>> listed in each JAR's services/ directory. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/ >>>>>>>>>>> src/main/java/org/apache/pirk/responder/wideskies/ >>>>>>> ResponderDriver.java >>>>>>>>>>> [2] >>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/ >>>>>>>>>>> src/main/java/org/apache/pirk/ >> responder/spi/ResponderPlugin. >>>> java >>>>>>>>>>> [3] >>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/ >>>>>>>>>>> src/main/java/org/apache/pirk/responder/wideskies/storm/ >>>>>>>>>>> StormResponder.java >>>>>>>>>>> [4] >>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/ >>>>>>>>>>> src/main/services/org.apache.responder.spi.Responder >>>>>>>>>>> >>>>>>>>>>> I'm not even going to dignify this with a WIP PR, it is far >>> from >>>>>>> ready, >>>>>>>>>>> so proceed with caution. There is hopefully enough there to >>>> show >>>>>> the >>>>>>>>>>> approach, and if it is worth continuing I'm happy to do so. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Tim >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> >> >
