Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Jean-Baptiste Onofré Sat, 26 Mar 2016 10:44:50 -0700

Hi Luciano,

I didn't mean Spark proper, but more something like you proposed.


Regards
JB

On 03/26/2016 06:38 PM, Luciano Resende wrote:



On Sat, Mar 26, 2016 at 10:20 AM, Jean-Baptiste Onofré <[email protected]
<mailto:[email protected]>> wrote:

    Hi Luciano,

    If we take the "pure" technical vision, there's pros and cons of
    having spark-extra (or whatever the name we give) still as an Apache
    project:

    Pro:
      - Governance & Quality Insurance: we follow the Apache rules,
    meaning that a release has to be staged and voted by the PMC. It's a
    form of governance of the project and quality (as the releases are
    reviewed).
      - Software origin: users know where the connector comes from, and
    they have the guarantee in term of licensing, etc.
      - IP/ICLA: We know the committers of this project, and we know
    they agree with the ICL agreement.

    Cons:
      - Third licenses support. As an Apache project, the "connectors"
    will be allowed to use only Apache or Category B licensed
    dependencies. For instance, if I would like to create a Spark
    connector for couchbase, I can't do it at Apache.


Yes, this is not solving the incompatible license problems

      - Release cycle. As an Apache project, it means we have to follow
    the rules, meaning that the release cycle can appear strict and long
    due to the staging and vote process. For me, it's a huge benefit but
    some can see as too strict ;)


IMHO, This is the small price we pay for all the good stuff you
mentioned in pro


    Maybe, we can imagine both, as we have in ServiceMix or Camel:
    - all modules/connectors matching the Apache rule (especially in
    term of licensing) should be in the Apache Spark-Modules (or
    Spark-Extensions, or whatever). It's like the ServiceMix Bundles.


If you are talking here about Spark proper, then we are currently seeing
that this is going to be hard. If there was a way to have more
flexibility to host these directly into Spark proper, I would never be
creating this thread as we would have all the pros you mentioned hosting
them directly into Spark.

    - all modules/connectors that can't fit into the Apache rule (due to
    licensing issue) can go into GitHub Spark-Extra (or Spark-Package).
    It's like the ServiceMix Extra or Camel Extra on github.


We could look into this, but it might be a "Spark Extra  discussion" on
how we can help foster a community around the non-compatible licensed
connectors.

    My $0.01.

    Regards
    JB


    On 03/26/2016 06:07 PM, Luciano Resende wrote:

        I believe some of this has been resolved in the context of some
        parts
        that had interest in one extra connector, but we still have a few
        removed, and as you mentioned, we still don't have a simple way or
        willingness to manage and be current on new packages like kafka. And
        based on the fact that this thread is still alive, I believe
        that other
        community members might have other concerns as well.

        After some thought, I believe having a separate project (what was
        mentioned here as Spark Extras) to handle Spark Connectors and Spark
        add-ons in general could be very beneficial to Spark and the overall
        Spark community, which would have a central place in Apache to
        collaborate around related Spark components.

        Some of the benefits on this approach

        - Enables maintaining the connectors inside Apache, following
        the Apache
        governance and release rules, while allowing Spark proper to
        focus on
        the core runtime.
        - Provides more flexibility in controlling the direction
        (currency) of
        the existing connectors (e.g. willing to find a solution and
        maintain
        multiple versions of same connectors like kafka 0.8x and 0.9x)
        - Becomes a home for other types of Spark related connectors helping
        expanding the community around Spark (e.g. Zeppelin see most of it's
        current contribution around new/enhanced connectors)

        What are some requirements for Spark Extras to be successful:

        - Be up to date with Spark Trunk APIs (based on daily CIs
        against SNAPSHOT)
        - Adhere to Spark release cycles (have a very little window
        compared to
        Spark release)
        - Be more open and flexible to the set of connectors it will
        accept and
        maintain (e.g. also handle multiple versions like the kafka 0.9
        issue we
        have today)

        Where to start Spark Extras

        Depending on the interest here, we could follow the steps of (Apache
        Arrow) and start this directly as a TLP, or start as an incubator
        project. I would consider the first option first.

        Who would participate

        Have thought about this for a bit, and if we go to the direction
        of TLP,
        I would say Spark Committers and Apache Members can request to
        participate as PMC members, while other committers can request
        to become
        committers. Non committers would be added based on meritocracy
        after the
        start of the project.

        Project Name

        It would be ideal if we could have a project name that shows
        close ties
        to Spark (e.g. Spark Extras or Spark Connectors) but we will need
        permission and support from whoever is going to evaluate the project
        proposal (e.g. Apache Board)


        Thoughts ?

        Does anyone have any big disagreement or objection to moving
        into this
        direction ?

        Otherwise, who would be interested in joining the project, so I can
        start working on some concrete proposal ?



        On Sat, Mar 26, 2016 at 6:58 AM, Sean Owen <[email protected]
        <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:

             This has been resolved; see the JIRA and related PRs but also
        
http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-Next-steps-td16783.html

             This is not a scenario where a [VOTE] needs to take place,
        and code
             changes don't proceed through PMC votes. From the project
        perspective,
             code was deleted/retired for lack of interest, and this is
        controlled
             by the normal lazy consensus protocol which wasn't vetoed.

             The subsequent discussion was in part about whether other
        modules
             should go, or whether one should come back, which it did.
        The latter
             suggests that change could have been left open for some
        discussion
             longer. Ideally, you would have commented before the
        initial change
             happened, but it sounds like several people would have
        liked more
             time. I don't think I'd call that "improper conduct"
        though, no. It
             was reversed via the same normal code management process.

             The rest of the question concerned what becomes of the code
        that was
             removed. It was revived outside the project for anyone who
        cares to
             continue collaborating. There seemed to be no disagreement
        about that,
             mostly because the code in question was of minimal
        interest. PMC
             doesn't need to rule on anything. There may still be some
        loose ends
             there like namespace changes. I'll add to the other thread
        about this.



             On Sat, Mar 26, 2016 at 1:17 PM, Jacek Laskowski
        <[email protected] <mailto:[email protected]>
             <mailto:[email protected] <mailto:[email protected]>>> wrote:
              > Hi,
              >
              > Although I'm not that much experienced member of ASF, I
        share your
              > concerns. I haven't looked at the issue from this point
        of view, but
              > after having read the thread I think PMC should've
        signed off the
              > migration of ASF-owned code to a non-ASF repo. At least
        a vote is
              > required (and this discussion is a sign that the process
        has not been
              > conducted properly as people have concerns, me including).
              >
              > Thanks Mridul!
              >
              > Pozdrawiam,
              > Jacek Laskowski
              > ----
              > https://medium.com/@jaceklaskowski/
              > Mastering Apache Spark http://bit.ly/mastering-apache-spark
              > Follow me at https://twitter.com/jaceklaskowski
              >
              >
              > On Thu, Mar 17, 2016 at 9:13 PM, Mridul Muralidharan
             <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:
              >> I am not referring to code edits - but to migrating
        submodules and
              >> code currently in Apache Spark to 'outside' of it.
              >> If I understand correctly, assets from Apache Spark are
        being moved
              >> out of it into thirdparty external repositories - not
        owned by
             Apache.
              >>
              >> At a minimum, dev@ discussion (like this one) should be
        initiated.
              >> As PMC is responsible for the project assets (including
        code),
             signoff
              >> is required for it IMO.
              >>
              >> More experienced Apache members might be opine better
        in case I
             got it wrong !
              >>
              >>
              >> Regards,
              >> Mridul
              >>
              >>
              >> On Thu, Mar 17, 2016 at 12:55 PM, Cody Koeninger
             <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:
              >>> Why would a PMC vote be necessary on every code deletion?
              >>>
              >>> There was a Jira and pull request discussion about the
             submodules that
              >>> have been removed so far.
              >>>
              >>> https://issues.apache.org/jira/browse/SPARK-13843
              >>>
              >>> There's another ongoing one about Kafka specifically
              >>>
              >>> https://issues.apache.org/jira/browse/SPARK-13877
              >>>
              >>>
              >>> On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan
             <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:
              >>>>
              >>>> I was not aware of a discussion in Dev list about
        this - agree
             with most of
              >>>> the observations.
              >>>> In addition, I did not see PMC signoff on moving
        (sub-)modules
             out.
              >>>>
              >>>> Regards
              >>>> Mridul
              >>>>
              >>>>
              >>>>
              >>>> On Thursday, March 17, 2016, Marcelo Vanzin
             <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:
              >>>>>
              >>>>> Hello all,
              >>>>>
              >>>>> Recently a lot of the streaming backends were moved
        to a separate
              >>>>> project on github and removed from the main Spark repo.
              >>>>>
              >>>>> While I think the idea is great, I'm a little
        worried about the
              >>>>> execution. Some concerns were already raised on the
        bug mentioned
              >>>>> above, but I'd like to have a more explicit
        discussion about
             this so
              >>>>> things don't fall through the cracks.
              >>>>>
              >>>>> Mainly I have three concerns.
              >>>>>
              >>>>> i. Ownership
              >>>>>
              >>>>> That code used to be run by the ASF, but now it's
        hosted in a
             github
              >>>>> repo owned not by the ASF. That sounds a little
        sub-optimal,
             if not
              >>>>> problematic.
              >>>>>
              >>>>> ii. Governance
              >>>>>
              >>>>> Similar to the above; who has commit access to the above
             repos? Will
              >>>>> all the Spark committers, present and future, have
        commit
             access to
              >>>>> all of those repos? Are they still going to be
        considered part of
              >>>>> Spark and have release management done through the Spark
             community?
              >>>>>
              >>>>>
              >>>>> For both of the questions above, why are they not
        turned into
              >>>>> sub-projects of Spark and hosted on the ASF repos? I
        believe
             there is
              >>>>> a mechanism to do that, without the need to keep the
        code in
             the main
              >>>>> Spark repo, right?
              >>>>>
              >>>>> iii. Usability
              >>>>>
              >>>>> This is another thing I don't see discussed. For
        Scala-based code
              >>>>> things don't change much, I guess, if the artifact names
             don't change
              >>>>> (another reason to keep things in the ASF?), but
        what about
             python?
              >>>>> How are pyspark users expected to get that code going
             forward, since
              >>>>> it's not in Spark's pyspark.zip anymore?
              >>>>>
              >>>>>
              >>>>> Is there an easy way of keeping these things within
        the ASF Spark
              >>>>> project? I think that would be better for everybody.
              >>>>>
              >>>>> --
              >>>>> Marcelo
              >>>>>
              >>>>>

        ---------------------------------------------------------------------
              >>>>> To unsubscribe, e-mail:
        [email protected]
        <mailto:[email protected]>
             <mailto:[email protected]
        <mailto:[email protected]>>
              >>>>> For additional commands, e-mail:
        [email protected] <mailto:[email protected]>
             <mailto:[email protected]
        <mailto:[email protected]>>
              >>>>>
              >>>>
              >>
              >>

        ---------------------------------------------------------------------
              >> To unsubscribe, e-mail:
        [email protected]
        <mailto:[email protected]>
             <mailto:[email protected]
        <mailto:[email protected]>>
              >> For additional commands, e-mail:
        [email protected] <mailto:[email protected]>
             <mailto:[email protected]
        <mailto:[email protected]>>
              >>
              >
              >
        ---------------------------------------------------------------------
              > To unsubscribe, e-mail: [email protected]
        <mailto:[email protected]>
             <mailto:[email protected]
        <mailto:[email protected]>>
              > For additional commands, e-mail:
        [email protected] <mailto:[email protected]>
             <mailto:[email protected]
        <mailto:[email protected]>>
              >


        ---------------------------------------------------------------------
             To unsubscribe, e-mail: [email protected]
        <mailto:[email protected]>
             <mailto:[email protected]
        <mailto:[email protected]>>
             For additional commands, e-mail: [email protected]
        <mailto:[email protected]>
             <mailto:[email protected]
        <mailto:[email protected]>>




        --
        Luciano Resende
        http://twitter.com/lresende1975
        http://lresende.blogspot.com/


    --
    Jean-Baptiste Onofré
    [email protected] <mailto:[email protected]>
    http://blog.nanthrax.net
    Talend - http://www.talend.com


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    <mailto:[email protected]>
    For additional commands, e-mail: [email protected]
    <mailto:[email protected]>




--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Reply via email to