On Sat, Mar 26, 2016 at 10:20 AM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
> Hi Luciano, > > If we take the "pure" technical vision, there's pros and cons of having > spark-extra (or whatever the name we give) still as an Apache project: > > Pro: > - Governance & Quality Insurance: we follow the Apache rules, meaning > that a release has to be staged and voted by the PMC. It's a form of > governance of the project and quality (as the releases are reviewed). > - Software origin: users know where the connector comes from, and they > have the guarantee in term of licensing, etc. > - IP/ICLA: We know the committers of this project, and we know they agree > with the ICL agreement. > > Cons: > - Third licenses support. As an Apache project, the "connectors" will be > allowed to use only Apache or Category B licensed dependencies. For > instance, if I would like to create a Spark connector for couchbase, I > can't do it at Apache. > Yes, this is not solving the incompatible license problems > - Release cycle. As an Apache project, it means we have to follow the > rules, meaning that the release cycle can appear strict and long due to the > staging and vote process. For me, it's a huge benefit but some can see as > too strict ;) > IMHO, This is the small price we pay for all the good stuff you mentioned in pro > > Maybe, we can imagine both, as we have in ServiceMix or Camel: > - all modules/connectors matching the Apache rule (especially in term of > licensing) should be in the Apache Spark-Modules (or Spark-Extensions, or > whatever). It's like the ServiceMix Bundles. > If you are talking here about Spark proper, then we are currently seeing that this is going to be hard. If there was a way to have more flexibility to host these directly into Spark proper, I would never be creating this thread as we would have all the pros you mentioned hosting them directly into Spark. > - all modules/connectors that can't fit into the Apache rule (due to > licensing issue) can go into GitHub Spark-Extra (or Spark-Package). It's > like the ServiceMix Extra or Camel Extra on github. > > We could look into this, but it might be a "Spark Extra discussion" on how we can help foster a community around the non-compatible licensed connectors. > My $0.01. > > Regards > JB > > > On 03/26/2016 06:07 PM, Luciano Resende wrote: > >> I believe some of this has been resolved in the context of some parts >> that had interest in one extra connector, but we still have a few >> removed, and as you mentioned, we still don't have a simple way or >> willingness to manage and be current on new packages like kafka. And >> based on the fact that this thread is still alive, I believe that other >> community members might have other concerns as well. >> >> After some thought, I believe having a separate project (what was >> mentioned here as Spark Extras) to handle Spark Connectors and Spark >> add-ons in general could be very beneficial to Spark and the overall >> Spark community, which would have a central place in Apache to >> collaborate around related Spark components. >> >> Some of the benefits on this approach >> >> - Enables maintaining the connectors inside Apache, following the Apache >> governance and release rules, while allowing Spark proper to focus on >> the core runtime. >> - Provides more flexibility in controlling the direction (currency) of >> the existing connectors (e.g. willing to find a solution and maintain >> multiple versions of same connectors like kafka 0.8x and 0.9x) >> - Becomes a home for other types of Spark related connectors helping >> expanding the community around Spark (e.g. Zeppelin see most of it's >> current contribution around new/enhanced connectors) >> >> What are some requirements for Spark Extras to be successful: >> >> - Be up to date with Spark Trunk APIs (based on daily CIs against >> SNAPSHOT) >> - Adhere to Spark release cycles (have a very little window compared to >> Spark release) >> - Be more open and flexible to the set of connectors it will accept and >> maintain (e.g. also handle multiple versions like the kafka 0.9 issue we >> have today) >> >> Where to start Spark Extras >> >> Depending on the interest here, we could follow the steps of (Apache >> Arrow) and start this directly as a TLP, or start as an incubator >> project. I would consider the first option first. >> >> Who would participate >> >> Have thought about this for a bit, and if we go to the direction of TLP, >> I would say Spark Committers and Apache Members can request to >> participate as PMC members, while other committers can request to become >> committers. Non committers would be added based on meritocracy after the >> start of the project. >> >> Project Name >> >> It would be ideal if we could have a project name that shows close ties >> to Spark (e.g. Spark Extras or Spark Connectors) but we will need >> permission and support from whoever is going to evaluate the project >> proposal (e.g. Apache Board) >> >> >> Thoughts ? >> >> Does anyone have any big disagreement or objection to moving into this >> direction ? >> >> Otherwise, who would be interested in joining the project, so I can >> start working on some concrete proposal ? >> >> >> >> On Sat, Mar 26, 2016 at 6:58 AM, Sean Owen <so...@cloudera.com >> <mailto:so...@cloudera.com>> wrote: >> >> This has been resolved; see the JIRA and related PRs but also >> >> http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-Next-steps-td16783.html >> >> This is not a scenario where a [VOTE] needs to take place, and code >> changes don't proceed through PMC votes. From the project perspective, >> code was deleted/retired for lack of interest, and this is controlled >> by the normal lazy consensus protocol which wasn't vetoed. >> >> The subsequent discussion was in part about whether other modules >> should go, or whether one should come back, which it did. The latter >> suggests that change could have been left open for some discussion >> longer. Ideally, you would have commented before the initial change >> happened, but it sounds like several people would have liked more >> time. I don't think I'd call that "improper conduct" though, no. It >> was reversed via the same normal code management process. >> >> The rest of the question concerned what becomes of the code that was >> removed. It was revived outside the project for anyone who cares to >> continue collaborating. There seemed to be no disagreement about that, >> mostly because the code in question was of minimal interest. PMC >> doesn't need to rule on anything. There may still be some loose ends >> there like namespace changes. I'll add to the other thread about this. >> >> >> >> On Sat, Mar 26, 2016 at 1:17 PM, Jacek Laskowski <ja...@japila.pl >> <mailto:ja...@japila.pl>> wrote: >> > Hi, >> > >> > Although I'm not that much experienced member of ASF, I share your >> > concerns. I haven't looked at the issue from this point of view, >> but >> > after having read the thread I think PMC should've signed off the >> > migration of ASF-owned code to a non-ASF repo. At least a vote is >> > required (and this discussion is a sign that the process has not >> been >> > conducted properly as people have concerns, me including). >> > >> > Thanks Mridul! >> > >> > Pozdrawiam, >> > Jacek Laskowski >> > ---- >> > https://medium.com/@jaceklaskowski/ >> > Mastering Apache Spark http://bit.ly/mastering-apache-spark >> > Follow me at https://twitter.com/jaceklaskowski >> > >> > >> > On Thu, Mar 17, 2016 at 9:13 PM, Mridul Muralidharan >> <mri...@gmail.com <mailto:mri...@gmail.com>> wrote: >> >> I am not referring to code edits - but to migrating submodules and >> >> code currently in Apache Spark to 'outside' of it. >> >> If I understand correctly, assets from Apache Spark are being >> moved >> >> out of it into thirdparty external repositories - not owned by >> Apache. >> >> >> >> At a minimum, dev@ discussion (like this one) should be >> initiated. >> >> As PMC is responsible for the project assets (including code), >> signoff >> >> is required for it IMO. >> >> >> >> More experienced Apache members might be opine better in case I >> got it wrong ! >> >> >> >> >> >> Regards, >> >> Mridul >> >> >> >> >> >> On Thu, Mar 17, 2016 at 12:55 PM, Cody Koeninger >> <c...@koeninger.org <mailto:c...@koeninger.org>> wrote: >> >>> Why would a PMC vote be necessary on every code deletion? >> >>> >> >>> There was a Jira and pull request discussion about the >> submodules that >> >>> have been removed so far. >> >>> >> >>> https://issues.apache.org/jira/browse/SPARK-13843 >> >>> >> >>> There's another ongoing one about Kafka specifically >> >>> >> >>> https://issues.apache.org/jira/browse/SPARK-13877 >> >>> >> >>> >> >>> On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan >> <mri...@gmail.com <mailto:mri...@gmail.com>> wrote: >> >>>> >> >>>> I was not aware of a discussion in Dev list about this - agree >> with most of >> >>>> the observations. >> >>>> In addition, I did not see PMC signoff on moving (sub-)modules >> out. >> >>>> >> >>>> Regards >> >>>> Mridul >> >>>> >> >>>> >> >>>> >> >>>> On Thursday, March 17, 2016, Marcelo Vanzin >> <van...@cloudera.com <mailto:van...@cloudera.com>> wrote: >> >>>>> >> >>>>> Hello all, >> >>>>> >> >>>>> Recently a lot of the streaming backends were moved to a >> separate >> >>>>> project on github and removed from the main Spark repo. >> >>>>> >> >>>>> While I think the idea is great, I'm a little worried about the >> >>>>> execution. Some concerns were already raised on the bug >> mentioned >> >>>>> above, but I'd like to have a more explicit discussion about >> this so >> >>>>> things don't fall through the cracks. >> >>>>> >> >>>>> Mainly I have three concerns. >> >>>>> >> >>>>> i. Ownership >> >>>>> >> >>>>> That code used to be run by the ASF, but now it's hosted in a >> github >> >>>>> repo owned not by the ASF. That sounds a little sub-optimal, >> if not >> >>>>> problematic. >> >>>>> >> >>>>> ii. Governance >> >>>>> >> >>>>> Similar to the above; who has commit access to the above >> repos? Will >> >>>>> all the Spark committers, present and future, have commit >> access to >> >>>>> all of those repos? Are they still going to be considered part >> of >> >>>>> Spark and have release management done through the Spark >> community? >> >>>>> >> >>>>> >> >>>>> For both of the questions above, why are they not turned into >> >>>>> sub-projects of Spark and hosted on the ASF repos? I believe >> there is >> >>>>> a mechanism to do that, without the need to keep the code in >> the main >> >>>>> Spark repo, right? >> >>>>> >> >>>>> iii. Usability >> >>>>> >> >>>>> This is another thing I don't see discussed. For Scala-based >> code >> >>>>> things don't change much, I guess, if the artifact names >> don't change >> >>>>> (another reason to keep things in the ASF?), but what about >> python? >> >>>>> How are pyspark users expected to get that code going >> forward, since >> >>>>> it's not in Spark's pyspark.zip anymore? >> >>>>> >> >>>>> >> >>>>> Is there an easy way of keeping these things within the ASF >> Spark >> >>>>> project? I think that would be better for everybody. >> >>>>> >> >>>>> -- >> >>>>> Marcelo >> >>>>> >> >>>>> >> --------------------------------------------------------------------- >> >>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> <mailto:dev-unsubscr...@spark.apache.org> >> >>>>> For additional commands, e-mail: dev-h...@spark.apache.org >> <mailto:dev-h...@spark.apache.org> >> >>>>> >> >>>> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> <mailto:dev-unsubscr...@spark.apache.org> >> >> For additional commands, e-mail: dev-h...@spark.apache.org >> <mailto:dev-h...@spark.apache.org> >> >> >> > >> > >> --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> <mailto:dev-unsubscr...@spark.apache.org> >> > For additional commands, e-mail: dev-h...@spark.apache.org >> <mailto:dev-h...@spark.apache.org> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> <mailto:dev-unsubscr...@spark.apache.org> >> For additional commands, e-mail: dev-h...@spark.apache.org >> <mailto:dev-h...@spark.apache.org> >> >> >> >> >> -- >> Luciano Resende >> http://twitter.com/lresende1975 >> http://lresende.blogspot.com/ >> > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > > -- Luciano Resende http://twitter.com/lresende1975 http://lresende.blogspot.com/