Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Tim Ellison Wed, 21 Sep 2016 01:50:30 -0700

On 21/09/16 03:22, Ellison Anne Williams wrote:
> I am in favor of breaking out pirk-core as specified so that our initial
> submodule structure would be as follows:
> 
> pirk-core (encryption,query, inputformat, serialization, utils)
> 
> pirk-responder (core responder incl. standalone)
> 
> pirk-querier
> 
> pirk-storm
> 
> pirk-mapreduce
> 
> pirk-spark
> 
> pirk-benchmark
> 
> pirk-distributed-test


Yes, I certainly wouldn't split it up any more than this yet.

> One thing to note is that under this breakdown, pirk-core would not include
> the Elasticsearch dependency (es-hadoop). The only submodules that would
> have the es-hadoop dependency (those which need it) currently are
> pirk-mapreduce, pirk-spark, and pirk-distributed-test.
> 
> 
> I believe that we agreed (somewhere :)) in this thread to go ahead and
> remove the platform 'backwards compatibility' for PIRK-63. Please holler if
> this is not correct.

I agree.  While it is trivial to maintain that compatibility, it feels
like we still are in an era where we should use the freedom to drop it.

Regards,
Tim

> On Tue, Sep 20, 2016 at 9:40 PM, Darin Johnson <[email protected]>
> wrote:
> 
>> Suneel, a google doc as promised, only a day late (sorry - sick kid).
>>
>> https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
>> mMrRQyynQ-Q6MFbI/edit?usp=sharing
>>
>> I was planning on working on this, but I'm going to take a day or two to
>> let others comment.
>>
>> Darin
>>
>> On Mon, Sep 19, 2016 at 5:07 PM, Suneel Marthi <[email protected]>
>> wrote:
>>
>>> A shared Google doc would be more convenient than a bunch of Jiras. Its
>>> easier to comment and add notes that way.
>>>
>>>
>>> On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <[email protected]
>>>
>>> wrote:
>>>
>>>> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
>>>> Based off my pirk-63 I was able to pull spark and storm out with no
>>>> issues.  I was planning to pull them out, then tackling elastic search,
>>>> then hadoop as it's a little entrenched.  This should keep most PRs to
>>>> manageable chunks. I think once that's done addressing the configs will
>>>> make more sense.
>>>>
>>>> I'm open to suggestions. But the hope would be:
>>>> Pirk-parent
>>>> Pirk-core
>>>> Pirk-hadoop
>>>> Pirk-storm
>>>> Pirk-parent
>>>>
>>>> Pirk-es is a little weird as it's really just an inputformat, seems
>> like
>>>> there's a more general solution here than creating submodules for every
>>>> inputformat.
>>>>
>>>> Darin
>>>>
>>>> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <[email protected]> wrote:
>>>>
>>>>>
>>>>
>>>>> Refactor is definitely a first priority.  Is there a design/proposal
>>>> draft
>>>>> that we could comment on about how to go about refactoring the
>> code.  I
>>>>> have been trying to keep up with the emails but definitely would have
>>>>> missed some.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
>>>>> [email protected] <[email protected]>> wrote:
>>>>>
>>>>>> Agree - let's leave the config/CLI the way it is for now and tackle
>>>> that as
>>>>>> a subsequent design discussion and PR.
>>>>>>
>>>>>> Also, I think that we should leave the ResponderDriver and the
>>>>>> ResponderProps alone for this PR and push to a subsequent PR (once
>> we
>>>>>> decide if and how we would like to delegate each).
>>>>>>
>>>>>> I vote to remove the 'platform' option and the backwards
>>> compatibility
>>>> in
>>>>>> this PR and proceed with having a ResponderLauncher interface and
>>>> forcing
>>>>>> its implementation by the ResponderDriver.
>>>>>>
>>>>>> And, I am not so concerned with having one fat jar vs. multiple
>> jars
>>>> right
>>>>>> now - to me, at this point, it's a 'nice to have' and not a 'must
>>> have'
>>>> for
>>>>>> Pirk functionality. We do need to break out Pirk into more clearly
>>>> defined
>>>>>> submodules (which is in progress) - via this re-factor, I think
>> that
>>> we
>>>>>> will gain some ability to generate multiple jars which is nice.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
>> [email protected]
>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> On 19/09/16 15:46, Darin Johnson wrote:
>>>>>>>> Hey guys,
>>>>>>>>
>>>>>>>> Thanks for looking at the PR, I apologize if it offended
>> anyone's
>>>>>> eyes:).
>>>>>>>>
>>>>>>>> I'm glad it generated some discussion about the
>> configuration.  I
>>>>>> didn't
>>>>>>>> really like where things were heading with the config.
>> However,
>>>> didn't
>>>>>>>> want to create to much scope creep.
>>>>>>>>
>>>>>>>> I think any hierarchical config (TypeSafe or yaml) would make
>>>> things
>>>>>> much
>>>>>>>> more maintainable, the plugin could simply grab the appropriate
>>>> part of
>>>>>>> the
>>>>>>>> config and handle accordingly.  I'd also cut down the number of
>>>> command
>>>>>>>> line options to only those that change between runs often (like
>>>>>>>> input/output)
>>>>>>>>
>>>>>>>>> One option is to make Pirk pluggable, so that a Pirk
>>> installation
>>>>>> could
>>>>>>>>> use one or more of these in an extensible fashion by adding
>> JAR
>>>> files.
>>>>>>>>> That would still require selecting one by command-line
>> argument.
>>>>>>>>
>>>>>>>> An argument for this approach is for lambda architecture
>>> approaches
>>>>>> (say
>>>>>>>> spark/spark-streaming) were the contents of the jars would be
>> so
>>>>>> similar
>>>>>>> it
>>>>>>>> seems like to much trouble to create separate jars.
>>>>>>>>
>>>>>>>> Happy to continue working on this given some direction on where
>>>> you'd
>>>>>>> like
>>>>>>>> it to go.  Also, it's a bit of a blocker to refactoring the
>> build
>>>> into
>>>>>>>> submodules.
>>>>>>>
>>>>>>> FWIW my 2c is to not try and fix all the problems in one go, and
>>>> rather
>>>>>>> take a compromise on the configurations while you tease apart the
>>>>>>> submodules in to separate source code trees, poms, etc; then come
>>>> back
>>>>>>> and fix the runtime configs.
>>>>>>>
>>>>>>> Once the submodules are in place it will open up more work for
>>>> release
>>>>>>> engineering and tinkering that can be done in parallel with the
>>>> config
>>>>>>> polishing.
>>>>>>>
>>>>>>> Just a thought.
>>>>>>> Tim
>>>>>>>
>>>>>>>
>>>>>>>> On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
>>>> [email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On 19/09/16 13:40, Ellison Anne Williams wrote:
>>>>>>>>>> It seems that it's the same idea as the ResponderLauncher
>> with
>>>> the
>>>>>>>>> service
>>>>>>>>>> component added to maintain something akin to the
>> 'platform'. I
>>>> would
>>>>>>>>>> prefer that we just did away with the platform notion
>>> altogether
>>>> and
>>>>>>> make
>>>>>>>>>> the ResponderDriver 'dumb'. We get around needing a
>>>> platform-aware
>>>>>>>>> service
>>>>>>>>>> by requiring the ResponderLauncher implementation to be
>> passed
>>> as
>>>> a
>>>>>> CLI
>>>>>>>>> to
>>>>>>>>>> the ResponderDriver.
>>>>>>>>>
>>>>>>>>> Let me check I understand what you are saying here.
>>>>>>>>>
>>>>>>>>> At the moment, there is a monolithic Pirk that hard codes how
>> to
>>>>>> respond
>>>>>>>>> using lots of different backends (mapreduce, spark,
>>>> sparkstreaming,
>>>>>>>>> storm , standalone), and that is selected by command-line
>>>> argument.
>>>>>>>>>
>>>>>>>>> One option is to make Pirk pluggable, so that a Pirk
>>> installation
>>>>>> could
>>>>>>>>> use one or more of these in an extensible fashion by adding
>> JAR
>>>> files.
>>>>>>>>> That would still require selecting one by command-line
>> argument.
>>>>>>>>>
>>>>>>>>> A second option is to simply pass in the required backend JAR
>> to
>>>>>> select
>>>>>>>>> the particular implementation you choose, as a specific Pirk
>>>>>>>>> installation doesn't need to use multiple backends
>>> simultaneously.
>>>>>>>>>
>>>>>>>>> ...and you are leaning towards the second option.  Do I have
>>> that
>>>>>>> correct?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Tim
>>>>>>>>>
>>>>>>>>>> Am I missing something? Is there a good reason to provide a
>>>> service
>>>>>> by
>>>>>>>>>> which platforms are registered? I'm open...
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> How about an approach like this?
>>>>>>>>>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
>>>>>>>>>>>
>>>>>>>>>>> The "on-ramp" is the driver [1], which calls upon the
>> service
>>> to
>>>>>> find
>>>>>>> a
>>>>>>>>>>> plug-in [2] that claims to implement the required platform
>>>>>> responder,
>>>>>>>>>>> e.g. [3].
>>>>>>>>>>>
>>>>>>>>>>> The list of plug-ins is given in the provider's JAR file, so
>>> the
>>>>>> ones
>>>>>>> we
>>>>>>>>>>> provide in Pirk are listed together [4], but if you split
>>> these
>>>> into
>>>>>>>>>>> modules, or somebody brings their own JAR alongside, these
>>> would
>>>> be
>>>>>>>>>>> listed in each JAR's services/ directory.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>>> src/main/java/org/apache/pirk/responder/wideskies/
>>>>>>> ResponderDriver.java
>>>>>>>>>>> [2]
>>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>>> src/main/java/org/apache/pirk/
>> responder/spi/ResponderPlugin.
>>>> java
>>>>>>>>>>> [3]
>>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
>>>>>>>>>>> StormResponder.java
>>>>>>>>>>> [4]
>>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>>> src/main/services/org.apache.responder.spi.Responder
>>>>>>>>>>>
>>>>>>>>>>> I'm not even going to dignify this with a WIP PR, it is far
>>> from
>>>>>>> ready,
>>>>>>>>>>> so proceed with caution.  There is hopefully enough there to
>>>> show
>>>>>> the
>>>>>>>>>>> approach, and if it is worth continuing I'm happy to do so.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Tim
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Reply via email to