On Sat, Mar 26, 2016 at 10:20 AM, Jean-Baptiste Onofré <j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:
Hi Luciano,
If we take the "pure" technical vision, there's pros and cons of
having spark-extra (or whatever the name we give) still as an Apache
project:
Pro:
- Governance & Quality Insurance: we follow the Apache rules,
meaning that a release has to be staged and voted by the PMC. It's a
form of governance of the project and quality (as the releases are
reviewed).
- Software origin: users know where the connector comes from, and
they have the guarantee in term of licensing, etc.
- IP/ICLA: We know the committers of this project, and we know
they agree with the ICL agreement.
Cons:
- Third licenses support. As an Apache project, the "connectors"
will be allowed to use only Apache or Category B licensed
dependencies. For instance, if I would like to create a Spark
connector for couchbase, I can't do it at Apache.
Yes, this is not solving the incompatible license problems
- Release cycle. As an Apache project, it means we have to follow
the rules, meaning that the release cycle can appear strict and long
due to the staging and vote process. For me, it's a huge benefit but
some can see as too strict ;)
IMHO, This is the small price we pay for all the good stuff you
mentioned in pro
Maybe, we can imagine both, as we have in ServiceMix or Camel:
- all modules/connectors matching the Apache rule (especially in
term of licensing) should be in the Apache Spark-Modules (or
Spark-Extensions, or whatever). It's like the ServiceMix Bundles.
If you are talking here about Spark proper, then we are currently seeing
that this is going to be hard. If there was a way to have more
flexibility to host these directly into Spark proper, I would never be
creating this thread as we would have all the pros you mentioned hosting
them directly into Spark.
- all modules/connectors that can't fit into the Apache rule (due to
licensing issue) can go into GitHub Spark-Extra (or Spark-Package).
It's like the ServiceMix Extra or Camel Extra on github.
We could look into this, but it might be a "Spark Extra discussion" on
how we can help foster a community around the non-compatible licensed
connectors.
My $0.01.
Regards
JB
On 03/26/2016 06:07 PM, Luciano Resende wrote:
I believe some of this has been resolved in the context of some
parts
that had interest in one extra connector, but we still have a few
removed, and as you mentioned, we still don't have a simple way or
willingness to manage and be current on new packages like kafka. And
based on the fact that this thread is still alive, I believe
that other
community members might have other concerns as well.
After some thought, I believe having a separate project (what was
mentioned here as Spark Extras) to handle Spark Connectors and Spark
add-ons in general could be very beneficial to Spark and the overall
Spark community, which would have a central place in Apache to
collaborate around related Spark components.
Some of the benefits on this approach
- Enables maintaining the connectors inside Apache, following
the Apache
governance and release rules, while allowing Spark proper to
focus on
the core runtime.
- Provides more flexibility in controlling the direction
(currency) of
the existing connectors (e.g. willing to find a solution and
maintain
multiple versions of same connectors like kafka 0.8x and 0.9x)
- Becomes a home for other types of Spark related connectors helping
expanding the community around Spark (e.g. Zeppelin see most of it's
current contribution around new/enhanced connectors)
What are some requirements for Spark Extras to be successful:
- Be up to date with Spark Trunk APIs (based on daily CIs
against SNAPSHOT)
- Adhere to Spark release cycles (have a very little window
compared to
Spark release)
- Be more open and flexible to the set of connectors it will
accept and
maintain (e.g. also handle multiple versions like the kafka 0.9
issue we
have today)
Where to start Spark Extras
Depending on the interest here, we could follow the steps of (Apache
Arrow) and start this directly as a TLP, or start as an incubator
project. I would consider the first option first.
Who would participate
Have thought about this for a bit, and if we go to the direction
of TLP,
I would say Spark Committers and Apache Members can request to
participate as PMC members, while other committers can request
to become
committers. Non committers would be added based on meritocracy
after the
start of the project.
Project Name
It would be ideal if we could have a project name that shows
close ties
to Spark (e.g. Spark Extras or Spark Connectors) but we will need
permission and support from whoever is going to evaluate the project
proposal (e.g. Apache Board)
Thoughts ?
Does anyone have any big disagreement or objection to moving
into this
direction ?
Otherwise, who would be interested in joining the project, so I can
start working on some concrete proposal ?
On Sat, Mar 26, 2016 at 6:58 AM, Sean Owen <so...@cloudera.com
<mailto:so...@cloudera.com>
<mailto:so...@cloudera.com <mailto:so...@cloudera.com>>> wrote:
This has been resolved; see the JIRA and related PRs but also
http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-Next-steps-td16783.html
This is not a scenario where a [VOTE] needs to take place,
and code
changes don't proceed through PMC votes. From the project
perspective,
code was deleted/retired for lack of interest, and this is
controlled
by the normal lazy consensus protocol which wasn't vetoed.
The subsequent discussion was in part about whether other
modules
should go, or whether one should come back, which it did.
The latter
suggests that change could have been left open for some
discussion
longer. Ideally, you would have commented before the
initial change
happened, but it sounds like several people would have
liked more
time. I don't think I'd call that "improper conduct"
though, no. It
was reversed via the same normal code management process.
The rest of the question concerned what becomes of the code
that was
removed. It was revived outside the project for anyone who
cares to
continue collaborating. There seemed to be no disagreement
about that,
mostly because the code in question was of minimal
interest. PMC
doesn't need to rule on anything. There may still be some
loose ends
there like namespace changes. I'll add to the other thread
about this.
On Sat, Mar 26, 2016 at 1:17 PM, Jacek Laskowski
<ja...@japila.pl <mailto:ja...@japila.pl>
<mailto:ja...@japila.pl <mailto:ja...@japila.pl>>> wrote:
> Hi,
>
> Although I'm not that much experienced member of ASF, I
share your
> concerns. I haven't looked at the issue from this point
of view, but
> after having read the thread I think PMC should've
signed off the
> migration of ASF-owned code to a non-ASF repo. At least
a vote is
> required (and this discussion is a sign that the process
has not been
> conducted properly as people have concerns, me including).
>
> Thanks Mridul!
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Mar 17, 2016 at 9:13 PM, Mridul Muralidharan
<mri...@gmail.com <mailto:mri...@gmail.com>
<mailto:mri...@gmail.com <mailto:mri...@gmail.com>>> wrote:
>> I am not referring to code edits - but to migrating
submodules and
>> code currently in Apache Spark to 'outside' of it.
>> If I understand correctly, assets from Apache Spark are
being moved
>> out of it into thirdparty external repositories - not
owned by
Apache.
>>
>> At a minimum, dev@ discussion (like this one) should be
initiated.
>> As PMC is responsible for the project assets (including
code),
signoff
>> is required for it IMO.
>>
>> More experienced Apache members might be opine better
in case I
got it wrong !
>>
>>
>> Regards,
>> Mridul
>>
>>
>> On Thu, Mar 17, 2016 at 12:55 PM, Cody Koeninger
<c...@koeninger.org <mailto:c...@koeninger.org>
<mailto:c...@koeninger.org <mailto:c...@koeninger.org>>> wrote:
>>> Why would a PMC vote be necessary on every code deletion?
>>>
>>> There was a Jira and pull request discussion about the
submodules that
>>> have been removed so far.
>>>
>>> https://issues.apache.org/jira/browse/SPARK-13843
>>>
>>> There's another ongoing one about Kafka specifically
>>>
>>> https://issues.apache.org/jira/browse/SPARK-13877
>>>
>>>
>>> On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan
<mri...@gmail.com <mailto:mri...@gmail.com>
<mailto:mri...@gmail.com <mailto:mri...@gmail.com>>> wrote:
>>>>
>>>> I was not aware of a discussion in Dev list about
this - agree
with most of
>>>> the observations.
>>>> In addition, I did not see PMC signoff on moving
(sub-)modules
out.
>>>>
>>>> Regards
>>>> Mridul
>>>>
>>>>
>>>>
>>>> On Thursday, March 17, 2016, Marcelo Vanzin
<van...@cloudera.com <mailto:van...@cloudera.com>
<mailto:van...@cloudera.com <mailto:van...@cloudera.com>>> wrote:
>>>>>
>>>>> Hello all,
>>>>>
>>>>> Recently a lot of the streaming backends were moved
to a separate
>>>>> project on github and removed from the main Spark repo.
>>>>>
>>>>> While I think the idea is great, I'm a little
worried about the
>>>>> execution. Some concerns were already raised on the
bug mentioned
>>>>> above, but I'd like to have a more explicit
discussion about
this so
>>>>> things don't fall through the cracks.
>>>>>
>>>>> Mainly I have three concerns.
>>>>>
>>>>> i. Ownership
>>>>>
>>>>> That code used to be run by the ASF, but now it's
hosted in a
github
>>>>> repo owned not by the ASF. That sounds a little
sub-optimal,
if not
>>>>> problematic.
>>>>>
>>>>> ii. Governance
>>>>>
>>>>> Similar to the above; who has commit access to the above
repos? Will
>>>>> all the Spark committers, present and future, have
commit
access to
>>>>> all of those repos? Are they still going to be
considered part of
>>>>> Spark and have release management done through the Spark
community?
>>>>>
>>>>>
>>>>> For both of the questions above, why are they not
turned into
>>>>> sub-projects of Spark and hosted on the ASF repos? I
believe
there is
>>>>> a mechanism to do that, without the need to keep the
code in
the main
>>>>> Spark repo, right?
>>>>>
>>>>> iii. Usability
>>>>>
>>>>> This is another thing I don't see discussed. For
Scala-based code
>>>>> things don't change much, I guess, if the artifact names
don't change
>>>>> (another reason to keep things in the ASF?), but
what about
python?
>>>>> How are pyspark users expected to get that code going
forward, since
>>>>> it's not in Spark's pyspark.zip anymore?
>>>>>
>>>>>
>>>>> Is there an easy way of keeping these things within
the ASF Spark
>>>>> project? I think that would be better for everybody.
>>>>>
>>>>> --
>>>>> Marcelo
>>>>>
>>>>>
---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:
dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>
<mailto:dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>>
>>>>> For additional commands, e-mail:
dev-h...@spark.apache.org <mailto:dev-h...@spark.apache.org>
<mailto:dev-h...@spark.apache.org
<mailto:dev-h...@spark.apache.org>>
>>>>>
>>>>
>>
>>
---------------------------------------------------------------------
>> To unsubscribe, e-mail:
dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>
<mailto:dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>>
>> For additional commands, e-mail:
dev-h...@spark.apache.org <mailto:dev-h...@spark.apache.org>
<mailto:dev-h...@spark.apache.org
<mailto:dev-h...@spark.apache.org>>
>>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>
<mailto:dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>>
> For additional commands, e-mail:
dev-h...@spark.apache.org <mailto:dev-h...@spark.apache.org>
<mailto:dev-h...@spark.apache.org
<mailto:dev-h...@spark.apache.org>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>
<mailto:dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>>
For additional commands, e-mail: dev-h...@spark.apache.org
<mailto:dev-h...@spark.apache.org>
<mailto:dev-h...@spark.apache.org
<mailto:dev-h...@spark.apache.org>>
--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/
--
Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
http://blog.nanthrax.net
Talend - http://www.talend.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>
For additional commands, e-mail: dev-h...@spark.apache.org
<mailto:dev-h...@spark.apache.org>
--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/