Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Luciano Resende Sat, 26 Mar 2016 10:08:18 -0700

I believe some of this has been resolved in the context of some parts that
had interest in one extra connector, but we still have a few removed, and
as you mentioned, we still don't have a simple way or willingness to manage
and be current on new packages like kafka. And based on the fact that this
thread is still alive, I believe that other community members might have
other concerns as well.


After some thought, I believe having a separate project (what was mentioned
here as Spark Extras) to handle Spark Connectors and Spark add-ons in
general could be very beneficial to Spark and the overall Spark community,
which would have a central place in Apache to collaborate around related
Spark components.

Some of the benefits on this approach

- Enables maintaining the connectors inside Apache, following the Apache
governance and release rules, while allowing Spark proper to focus on the
core runtime.
- Provides more flexibility in controlling the direction (currency) of the
existing connectors (e.g. willing to find a solution and maintain multiple
versions of same connectors like kafka 0.8x and 0.9x)
- Becomes a home for other types of Spark related connectors helping
expanding the community around Spark (e.g. Zeppelin see most of it's
current contribution around new/enhanced connectors)

What are some requirements for Spark Extras to be successful:

- Be up to date with Spark Trunk APIs (based on daily CIs against SNAPSHOT)
- Adhere to Spark release cycles (have a very little window compared to
Spark release)
- Be more open and flexible to the set of connectors it will accept and
maintain (e.g. also handle multiple versions like the kafka 0.9 issue we
have today)

Where to start Spark Extras

Depending on the interest here, we could follow the steps of (Apache Arrow)
and start this directly as a TLP, or start as an incubator project. I would
consider the first option first.

Who would participate

Have thought about this for a bit, and if we go to the direction of TLP, I
would say Spark Committers and Apache Members can request to participate as
PMC members, while other committers can request to become committers. Non
committers would be added based on meritocracy after the start of the
project.

Project Name

It would be ideal if we could have a project name that shows close ties to
Spark (e.g. Spark Extras or Spark Connectors) but we will need permission
and support from whoever is going to evaluate the project proposal (e.g.
Apache Board)


Thoughts ?

Does anyone have any big disagreement or objection to moving into this
direction ?

Otherwise, who would be interested in joining the project, so I can start
working on some concrete proposal ?



On Sat, Mar 26, 2016 at 6:58 AM, Sean Owen <so...@cloudera.com> wrote:

> This has been resolved; see the JIRA and related PRs but also
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-Next-steps-td16783.html
>
> This is not a scenario where a [VOTE] needs to take place, and code
> changes don't proceed through PMC votes. From the project perspective,
> code was deleted/retired for lack of interest, and this is controlled
> by the normal lazy consensus protocol which wasn't vetoed.
>
> The subsequent discussion was in part about whether other modules
> should go, or whether one should come back, which it did. The latter
> suggests that change could have been left open for some discussion
> longer. Ideally, you would have commented before the initial change
> happened, but it sounds like several people would have liked more
> time. I don't think I'd call that "improper conduct" though, no. It
> was reversed via the same normal code management process.
>
> The rest of the question concerned what becomes of the code that was
> removed. It was revived outside the project for anyone who cares to
> continue collaborating. There seemed to be no disagreement about that,
> mostly because the code in question was of minimal interest. PMC
> doesn't need to rule on anything. There may still be some loose ends
> there like namespace changes. I'll add to the other thread about this.
>
>
>
> On Sat, Mar 26, 2016 at 1:17 PM, Jacek Laskowski <ja...@japila.pl> wrote:
> > Hi,
> >
> > Although I'm not that much experienced member of ASF, I share your
> > concerns. I haven't looked at the issue from this point of view, but
> > after having read the thread I think PMC should've signed off the
> > migration of ASF-owned code to a non-ASF repo. At least a vote is
> > required (and this discussion is a sign that the process has not been
> > conducted properly as people have concerns, me including).
> >
> > Thanks Mridul!
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > ----
> > https://medium.com/@jaceklaskowski/
> > Mastering Apache Spark http://bit.ly/mastering-apache-spark
> > Follow me at https://twitter.com/jaceklaskowski
> >
> >
> > On Thu, Mar 17, 2016 at 9:13 PM, Mridul Muralidharan <mri...@gmail.com>
> wrote:
> >> I am not referring to code edits - but to migrating submodules and
> >> code currently in Apache Spark to 'outside' of it.
> >> If I understand correctly, assets from Apache Spark are being moved
> >> out of it into thirdparty external repositories - not owned by Apache.
> >>
> >> At a minimum, dev@ discussion (like this one) should be initiated.
> >> As PMC is responsible for the project assets (including code), signoff
> >> is required for it IMO.
> >>
> >> More experienced Apache members might be opine better in case I got it
> wrong !
> >>
> >>
> >> Regards,
> >> Mridul
> >>
> >>
> >> On Thu, Mar 17, 2016 at 12:55 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
> >>> Why would a PMC vote be necessary on every code deletion?
> >>>
> >>> There was a Jira and pull request discussion about the submodules that
> >>> have been removed so far.
> >>>
> >>> https://issues.apache.org/jira/browse/SPARK-13843
> >>>
> >>> There's another ongoing one about Kafka specifically
> >>>
> >>> https://issues.apache.org/jira/browse/SPARK-13877
> >>>
> >>>
> >>> On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan <mri...@gmail.com>
> wrote:
> >>>>
> >>>> I was not aware of a discussion in Dev list about this - agree with
> most of
> >>>> the observations.
> >>>> In addition, I did not see PMC signoff on moving (sub-)modules out.
> >>>>
> >>>> Regards
> >>>> Mridul
> >>>>
> >>>>
> >>>>
> >>>> On Thursday, March 17, 2016, Marcelo Vanzin <van...@cloudera.com>
> wrote:
> >>>>>
> >>>>> Hello all,
> >>>>>
> >>>>> Recently a lot of the streaming backends were moved to a separate
> >>>>> project on github and removed from the main Spark repo.
> >>>>>
> >>>>> While I think the idea is great, I'm a little worried about the
> >>>>> execution. Some concerns were already raised on the bug mentioned
> >>>>> above, but I'd like to have a more explicit discussion about this so
> >>>>> things don't fall through the cracks.
> >>>>>
> >>>>> Mainly I have three concerns.
> >>>>>
> >>>>> i. Ownership
> >>>>>
> >>>>> That code used to be run by the ASF, but now it's hosted in a github
> >>>>> repo owned not by the ASF. That sounds a little sub-optimal, if not
> >>>>> problematic.
> >>>>>
> >>>>> ii. Governance
> >>>>>
> >>>>> Similar to the above; who has commit access to the above repos? Will
> >>>>> all the Spark committers, present and future, have commit access to
> >>>>> all of those repos? Are they still going to be considered part of
> >>>>> Spark and have release management done through the Spark community?
> >>>>>
> >>>>>
> >>>>> For both of the questions above, why are they not turned into
> >>>>> sub-projects of Spark and hosted on the ASF repos? I believe there is
> >>>>> a mechanism to do that, without the need to keep the code in the main
> >>>>> Spark repo, right?
> >>>>>
> >>>>> iii. Usability
> >>>>>
> >>>>> This is another thing I don't see discussed. For Scala-based code
> >>>>> things don't change much, I guess, if the artifact names don't change
> >>>>> (another reason to keep things in the ASF?), but what about python?
> >>>>> How are pyspark users expected to get that code going forward, since
> >>>>> it's not in Spark's pyspark.zip anymore?
> >>>>>
> >>>>>
> >>>>> Is there an easy way of keeping these things within the ASF Spark
> >>>>> project? I think that would be better for everybody.
> >>>>>
> >>>>> --
> >>>>> Marcelo
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >>>>> For additional commands, e-mail: dev-h...@spark.apache.org
> >>>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Reply via email to