Hi Luciano,
If we take the "pure" technical vision, there's pros and cons of having
spark-extra (or whatever the name we give) still as an Apache project:
Pro:
- Governance & Quality Insurance: we follow the Apache rules, meaning
that a release has to be staged and voted by the PMC. It's a form of
governance of the project and quality (as the releases are reviewed).
- Software origin: users know where the connector comes from, and they
have the guarantee in term of licensing, etc.
- IP/ICLA: We know the committers of this project, and we know they
agree with the ICL agreement.
Cons:
- Third licenses support. As an Apache project, the "connectors" will
be allowed to use only Apache or Category B licensed dependencies. For
instance, if I would like to create a Spark connector for couchbase, I
can't do it at Apache.
- Release cycle. As an Apache project, it means we have to follow the
rules, meaning that the release cycle can appear strict and long due to
the staging and vote process. For me, it's a huge benefit but some can
see as too strict ;)
Maybe, we can imagine both, as we have in ServiceMix or Camel:
- all modules/connectors matching the Apache rule (especially in term of
licensing) should be in the Apache Spark-Modules (or Spark-Extensions,
or whatever). It's like the ServiceMix Bundles.
- all modules/connectors that can't fit into the Apache rule (due to
licensing issue) can go into GitHub Spark-Extra (or Spark-Package). It's
like the ServiceMix Extra or Camel Extra on github.
My $0.01.
Regards
JB
On 03/26/2016 06:07 PM, Luciano Resende wrote:
I believe some of this has been resolved in the context of some parts
that had interest in one extra connector, but we still have a few
removed, and as you mentioned, we still don't have a simple way or
willingness to manage and be current on new packages like kafka. And
based on the fact that this thread is still alive, I believe that other
community members might have other concerns as well.
After some thought, I believe having a separate project (what was
mentioned here as Spark Extras) to handle Spark Connectors and Spark
add-ons in general could be very beneficial to Spark and the overall
Spark community, which would have a central place in Apache to
collaborate around related Spark components.
Some of the benefits on this approach
- Enables maintaining the connectors inside Apache, following the Apache
governance and release rules, while allowing Spark proper to focus on
the core runtime.
- Provides more flexibility in controlling the direction (currency) of
the existing connectors (e.g. willing to find a solution and maintain
multiple versions of same connectors like kafka 0.8x and 0.9x)
- Becomes a home for other types of Spark related connectors helping
expanding the community around Spark (e.g. Zeppelin see most of it's
current contribution around new/enhanced connectors)
What are some requirements for Spark Extras to be successful:
- Be up to date with Spark Trunk APIs (based on daily CIs against SNAPSHOT)
- Adhere to Spark release cycles (have a very little window compared to
Spark release)
- Be more open and flexible to the set of connectors it will accept and
maintain (e.g. also handle multiple versions like the kafka 0.9 issue we
have today)
Where to start Spark Extras
Depending on the interest here, we could follow the steps of (Apache
Arrow) and start this directly as a TLP, or start as an incubator
project. I would consider the first option first.
Who would participate
Have thought about this for a bit, and if we go to the direction of TLP,
I would say Spark Committers and Apache Members can request to
participate as PMC members, while other committers can request to become
committers. Non committers would be added based on meritocracy after the
start of the project.
Project Name
It would be ideal if we could have a project name that shows close ties
to Spark (e.g. Spark Extras or Spark Connectors) but we will need
permission and support from whoever is going to evaluate the project
proposal (e.g. Apache Board)
Thoughts ?
Does anyone have any big disagreement or objection to moving into this
direction ?
Otherwise, who would be interested in joining the project, so I can
start working on some concrete proposal ?
On Sat, Mar 26, 2016 at 6:58 AM, Sean Owen <so...@cloudera.com
<mailto:so...@cloudera.com>> wrote:
This has been resolved; see the JIRA and related PRs but also
http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-Next-steps-td16783.html
This is not a scenario where a [VOTE] needs to take place, and code
changes don't proceed through PMC votes. From the project perspective,
code was deleted/retired for lack of interest, and this is controlled
by the normal lazy consensus protocol which wasn't vetoed.
The subsequent discussion was in part about whether other modules
should go, or whether one should come back, which it did. The latter
suggests that change could have been left open for some discussion
longer. Ideally, you would have commented before the initial change
happened, but it sounds like several people would have liked more
time. I don't think I'd call that "improper conduct" though, no. It
was reversed via the same normal code management process.
The rest of the question concerned what becomes of the code that was
removed. It was revived outside the project for anyone who cares to
continue collaborating. There seemed to be no disagreement about that,
mostly because the code in question was of minimal interest. PMC
doesn't need to rule on anything. There may still be some loose ends
there like namespace changes. I'll add to the other thread about this.
On Sat, Mar 26, 2016 at 1:17 PM, Jacek Laskowski <ja...@japila.pl
<mailto:ja...@japila.pl>> wrote:
> Hi,
>
> Although I'm not that much experienced member of ASF, I share your
> concerns. I haven't looked at the issue from this point of view, but
> after having read the thread I think PMC should've signed off the
> migration of ASF-owned code to a non-ASF repo. At least a vote is
> required (and this discussion is a sign that the process has not been
> conducted properly as people have concerns, me including).
>
> Thanks Mridul!
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Mar 17, 2016 at 9:13 PM, Mridul Muralidharan
<mri...@gmail.com <mailto:mri...@gmail.com>> wrote:
>> I am not referring to code edits - but to migrating submodules and
>> code currently in Apache Spark to 'outside' of it.
>> If I understand correctly, assets from Apache Spark are being moved
>> out of it into thirdparty external repositories - not owned by
Apache.
>>
>> At a minimum, dev@ discussion (like this one) should be initiated.
>> As PMC is responsible for the project assets (including code),
signoff
>> is required for it IMO.
>>
>> More experienced Apache members might be opine better in case I
got it wrong !
>>
>>
>> Regards,
>> Mridul
>>
>>
>> On Thu, Mar 17, 2016 at 12:55 PM, Cody Koeninger
<c...@koeninger.org <mailto:c...@koeninger.org>> wrote:
>>> Why would a PMC vote be necessary on every code deletion?
>>>
>>> There was a Jira and pull request discussion about the
submodules that
>>> have been removed so far.
>>>
>>> https://issues.apache.org/jira/browse/SPARK-13843
>>>
>>> There's another ongoing one about Kafka specifically
>>>
>>> https://issues.apache.org/jira/browse/SPARK-13877
>>>
>>>
>>> On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan
<mri...@gmail.com <mailto:mri...@gmail.com>> wrote:
>>>>
>>>> I was not aware of a discussion in Dev list about this - agree
with most of
>>>> the observations.
>>>> In addition, I did not see PMC signoff on moving (sub-)modules
out.
>>>>
>>>> Regards
>>>> Mridul
>>>>
>>>>
>>>>
>>>> On Thursday, March 17, 2016, Marcelo Vanzin
<van...@cloudera.com <mailto:van...@cloudera.com>> wrote:
>>>>>
>>>>> Hello all,
>>>>>
>>>>> Recently a lot of the streaming backends were moved to a separate
>>>>> project on github and removed from the main Spark repo.
>>>>>
>>>>> While I think the idea is great, I'm a little worried about the
>>>>> execution. Some concerns were already raised on the bug mentioned
>>>>> above, but I'd like to have a more explicit discussion about
this so
>>>>> things don't fall through the cracks.
>>>>>
>>>>> Mainly I have three concerns.
>>>>>
>>>>> i. Ownership
>>>>>
>>>>> That code used to be run by the ASF, but now it's hosted in a
github
>>>>> repo owned not by the ASF. That sounds a little sub-optimal,
if not
>>>>> problematic.
>>>>>
>>>>> ii. Governance
>>>>>
>>>>> Similar to the above; who has commit access to the above
repos? Will
>>>>> all the Spark committers, present and future, have commit
access to
>>>>> all of those repos? Are they still going to be considered part of
>>>>> Spark and have release management done through the Spark
community?
>>>>>
>>>>>
>>>>> For both of the questions above, why are they not turned into
>>>>> sub-projects of Spark and hosted on the ASF repos? I believe
there is
>>>>> a mechanism to do that, without the need to keep the code in
the main
>>>>> Spark repo, right?
>>>>>
>>>>> iii. Usability
>>>>>
>>>>> This is another thing I don't see discussed. For Scala-based code
>>>>> things don't change much, I guess, if the artifact names
don't change
>>>>> (another reason to keep things in the ASF?), but what about
python?
>>>>> How are pyspark users expected to get that code going
forward, since
>>>>> it's not in Spark's pyspark.zip anymore?
>>>>>
>>>>>
>>>>> Is there an easy way of keeping these things within the ASF Spark
>>>>> project? I think that would be better for everybody.
>>>>>
>>>>> --
>>>>> Marcelo
>>>>>
>>>>>
---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>
>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
<mailto:dev-h...@spark.apache.org>
>>>>>
>>>>
>>
>>
---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>
>> For additional commands, e-mail: dev-h...@spark.apache.org
<mailto:dev-h...@spark.apache.org>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>
> For additional commands, e-mail: dev-h...@spark.apache.org
<mailto:dev-h...@spark.apache.org>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>
For additional commands, e-mail: dev-h...@spark.apache.org
<mailto:dev-h...@spark.apache.org>
--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org