Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Jean-Baptiste Onofré Sat, 26 Mar 2016 10:21:28 -0700

Hi Luciano,

If we take the "pure" technical vision, there's pros and cons of havingspark-extra (or whatever the name we give) still as an Apache project:


Pro:

- Governance & Quality Insurance: we follow the Apache rules, meaningthat a release has to be staged and voted by the PMC. It's a form ofgovernance of the project and quality (as the releases are reviewed).- Software origin: users know where the connector comes from, and theyhave the guarantee in term of licensing, etc.- IP/ICLA: We know the committers of this project, and we know theyagree with the ICL agreement.


Cons:

- Third licenses support. As an Apache project, the "connectors" willbe allowed to use only Apache or Category B licensed dependencies. Forinstance, if I would like to create a Spark connector for couchbase, Ican't do it at Apache.- Release cycle. As an Apache project, it means we have to follow therules, meaning that the release cycle can appear strict and long due tothe staging and vote process. For me, it's a huge benefit but some cansee as too strict ;)


Maybe, we can imagine both, as we have in ServiceMix or Camel:

- all modules/connectors matching the Apache rule (especially in term oflicensing) should be in the Apache Spark-Modules (or Spark-Extensions,or whatever). It's like the ServiceMix Bundles.- all modules/connectors that can't fit into the Apache rule (due tolicensing issue) can go into GitHub Spark-Extra (or Spark-Package). It'slike the ServiceMix Extra or Camel Extra on github.


My $0.01.

Regards
JB

On 03/26/2016 06:07 PM, Luciano Resende wrote:

I believe some of this has been resolved in the context of some parts
that had interest in one extra connector, but we still have a few
removed, and as you mentioned, we still don't have a simple way or
willingness to manage and be current on new packages like kafka. And
based on the fact that this thread is still alive, I believe that other
community members might have other concerns as well.

After some thought, I believe having a separate project (what was
mentioned here as Spark Extras) to handle Spark Connectors and Spark
add-ons in general could be very beneficial to Spark and the overall
Spark community, which would have a central place in Apache to
collaborate around related Spark components.

Some of the benefits on this approach

- Enables maintaining the connectors inside Apache, following the Apache
governance and release rules, while allowing Spark proper to focus on
the core runtime.
- Provides more flexibility in controlling the direction (currency) of
the existing connectors (e.g. willing to find a solution and maintain
multiple versions of same connectors like kafka 0.8x and 0.9x)
- Becomes a home for other types of Spark related connectors helping
expanding the community around Spark (e.g. Zeppelin see most of it's
current contribution around new/enhanced connectors)

What are some requirements for Spark Extras to be successful:

- Be up to date with Spark Trunk APIs (based on daily CIs against SNAPSHOT)
- Adhere to Spark release cycles (have a very little window compared to
Spark release)
- Be more open and flexible to the set of connectors it will accept and
maintain (e.g. also handle multiple versions like the kafka 0.9 issue we
have today)

Where to start Spark Extras

Depending on the interest here, we could follow the steps of (Apache
Arrow) and start this directly as a TLP, or start as an incubator
project. I would consider the first option first.

Who would participate

Have thought about this for a bit, and if we go to the direction of TLP,
I would say Spark Committers and Apache Members can request to
participate as PMC members, while other committers can request to become
committers. Non committers would be added based on meritocracy after the
start of the project.

Project Name

It would be ideal if we could have a project name that shows close ties
to Spark (e.g. Spark Extras or Spark Connectors) but we will need
permission and support from whoever is going to evaluate the project
proposal (e.g. Apache Board)


Thoughts ?

Does anyone have any big disagreement or objection to moving into this
direction ?

Otherwise, who would be interested in joining the project, so I can
start working on some concrete proposal ?



On Sat, Mar 26, 2016 at 6:58 AM, Sean Owen <so...@cloudera.com
<mailto:so...@cloudera.com>> wrote:

    This has been resolved; see the JIRA and related PRs but also
    
http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-Next-steps-td16783.html

    This is not a scenario where a [VOTE] needs to take place, and code
    changes don't proceed through PMC votes. From the project perspective,
    code was deleted/retired for lack of interest, and this is controlled
    by the normal lazy consensus protocol which wasn't vetoed.

    The subsequent discussion was in part about whether other modules
    should go, or whether one should come back, which it did. The latter
    suggests that change could have been left open for some discussion
    longer. Ideally, you would have commented before the initial change
    happened, but it sounds like several people would have liked more
    time. I don't think I'd call that "improper conduct" though, no. It
    was reversed via the same normal code management process.

    The rest of the question concerned what becomes of the code that was
    removed. It was revived outside the project for anyone who cares to
    continue collaborating. There seemed to be no disagreement about that,
    mostly because the code in question was of minimal interest. PMC
    doesn't need to rule on anything. There may still be some loose ends
    there like namespace changes. I'll add to the other thread about this.



    On Sat, Mar 26, 2016 at 1:17 PM, Jacek Laskowski <ja...@japila.pl
    <mailto:ja...@japila.pl>> wrote:
     > Hi,
     >
     > Although I'm not that much experienced member of ASF, I share your
     > concerns. I haven't looked at the issue from this point of view, but
     > after having read the thread I think PMC should've signed off the
     > migration of ASF-owned code to a non-ASF repo. At least a vote is
     > required (and this discussion is a sign that the process has not been
     > conducted properly as people have concerns, me including).
     >
     > Thanks Mridul!
     >
     > Pozdrawiam,
     > Jacek Laskowski
     > ----
     > https://medium.com/@jaceklaskowski/
     > Mastering Apache Spark http://bit.ly/mastering-apache-spark
     > Follow me at https://twitter.com/jaceklaskowski
     >
     >
     > On Thu, Mar 17, 2016 at 9:13 PM, Mridul Muralidharan
    <mri...@gmail.com <mailto:mri...@gmail.com>> wrote:
     >> I am not referring to code edits - but to migrating submodules and
     >> code currently in Apache Spark to 'outside' of it.
     >> If I understand correctly, assets from Apache Spark are being moved
     >> out of it into thirdparty external repositories - not owned by
    Apache.
     >>
     >> At a minimum, dev@ discussion (like this one) should be initiated.
     >> As PMC is responsible for the project assets (including code),
    signoff
     >> is required for it IMO.
     >>
     >> More experienced Apache members might be opine better in case I
    got it wrong !
     >>
     >>
     >> Regards,
     >> Mridul
     >>
     >>
     >> On Thu, Mar 17, 2016 at 12:55 PM, Cody Koeninger
    <c...@koeninger.org <mailto:c...@koeninger.org>> wrote:
     >>> Why would a PMC vote be necessary on every code deletion?
     >>>
     >>> There was a Jira and pull request discussion about the
    submodules that
     >>> have been removed so far.
     >>>
     >>> https://issues.apache.org/jira/browse/SPARK-13843
     >>>
     >>> There's another ongoing one about Kafka specifically
     >>>
     >>> https://issues.apache.org/jira/browse/SPARK-13877
     >>>
     >>>
     >>> On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan
    <mri...@gmail.com <mailto:mri...@gmail.com>> wrote:
     >>>>
     >>>> I was not aware of a discussion in Dev list about this - agree
    with most of
     >>>> the observations.
     >>>> In addition, I did not see PMC signoff on moving (sub-)modules
    out.
     >>>>
     >>>> Regards
     >>>> Mridul
     >>>>
     >>>>
     >>>>
     >>>> On Thursday, March 17, 2016, Marcelo Vanzin
    <van...@cloudera.com <mailto:van...@cloudera.com>> wrote:
     >>>>>
     >>>>> Hello all,
     >>>>>
     >>>>> Recently a lot of the streaming backends were moved to a separate
     >>>>> project on github and removed from the main Spark repo.
     >>>>>
     >>>>> While I think the idea is great, I'm a little worried about the
     >>>>> execution. Some concerns were already raised on the bug mentioned
     >>>>> above, but I'd like to have a more explicit discussion about
    this so
     >>>>> things don't fall through the cracks.
     >>>>>
     >>>>> Mainly I have three concerns.
     >>>>>
     >>>>> i. Ownership
     >>>>>
     >>>>> That code used to be run by the ASF, but now it's hosted in a
    github
     >>>>> repo owned not by the ASF. That sounds a little sub-optimal,
    if not
     >>>>> problematic.
     >>>>>
     >>>>> ii. Governance
     >>>>>
     >>>>> Similar to the above; who has commit access to the above
    repos? Will
     >>>>> all the Spark committers, present and future, have commit
    access to
     >>>>> all of those repos? Are they still going to be considered part of
     >>>>> Spark and have release management done through the Spark
    community?
     >>>>>
     >>>>>
     >>>>> For both of the questions above, why are they not turned into
     >>>>> sub-projects of Spark and hosted on the ASF repos? I believe
    there is
     >>>>> a mechanism to do that, without the need to keep the code in
    the main
     >>>>> Spark repo, right?
     >>>>>
     >>>>> iii. Usability
     >>>>>
     >>>>> This is another thing I don't see discussed. For Scala-based code
     >>>>> things don't change much, I guess, if the artifact names
    don't change
     >>>>> (another reason to keep things in the ASF?), but what about
    python?
     >>>>> How are pyspark users expected to get that code going
    forward, since
     >>>>> it's not in Spark's pyspark.zip anymore?
     >>>>>
     >>>>>
     >>>>> Is there an easy way of keeping these things within the ASF Spark
     >>>>> project? I think that would be better for everybody.
     >>>>>
     >>>>> --
     >>>>> Marcelo
     >>>>>
     >>>>>
    ---------------------------------------------------------------------
     >>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
    <mailto:dev-unsubscr...@spark.apache.org>
     >>>>> For additional commands, e-mail: dev-h...@spark.apache.org
    <mailto:dev-h...@spark.apache.org>
     >>>>>
     >>>>
     >>
     >>
    ---------------------------------------------------------------------
     >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
    <mailto:dev-unsubscr...@spark.apache.org>
     >> For additional commands, e-mail: dev-h...@spark.apache.org
    <mailto:dev-h...@spark.apache.org>
     >>
     >
     > ---------------------------------------------------------------------
     > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
    <mailto:dev-unsubscr...@spark.apache.org>
     > For additional commands, e-mail: dev-h...@spark.apache.org
    <mailto:dev-h...@spark.apache.org>
     >

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
    <mailto:dev-unsubscr...@spark.apache.org>
    For additional commands, e-mail: dev-h...@spark.apache.org
    <mailto:dev-h...@spark.apache.org>




--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Reply via email to