Yet another (inevitably flawed) data set: https://libraries.io/licenses
On Tue, Jan 10, 2017, 11:07 AM Luis Villa <l...@lu.is> wrote: > [Apparently I got unsubscribed at some point, so if you've sent an email > here in recent months seeking my feedback, please resend.] > > Hey, all- > I promised some board members a summary of my investigation in '12-'13 > into updating, supplementing, or replacing the "popular licenses" list. > Here goes. > > > *tl;dr* > I think OSI should have an data-driven short license list with a > replicable and transparent methodology, supplemented by a new-and-good(?) > list that captures licenses that aren't yet popular but are high quality > and have some substantial improvement that advances the goals of OSI. > > > *Purposes of non-comprehensive lists* > If you Google "open source licenses", OSI pages are the top two hits. > Historically, those pages were not very helpful unless you already knew > something about open source. Having a shorter "top" list can help make the > OSI website more useful to newcomers by suggesting a starting place for > their exploration and education about open source. > > In addition, third parties often look to OSI as a trusted (neutral?) > source for "top" or "best" licenses that they can incorporate into > products. (The full OSI-approved list is not practical for many > applications.) For example, if OSI had an up-to-date short list, it might > have been the basis for GitHub's license chooser. > > A list that is purely based on popularity would freeze open source in a > particular time, likely making it hard for new licenses with important > innovations to get adoption. However, a list based on more subjective > criteria is hard to create and update. > > *Past attempts* > > The proliferation report attempted to address this problem by categorizing > existing licenses. These categories were, intentionally or not, seen as the > "popular or strong communities list" and "everything else". Without a > process or clear set of criteria to update the "popular" list, however, it > became frozen in time. It is now difficult to credibly recommend the list > to newcomers or third parties (MPL 1.1 is deprecated; no mention of > Blackduck #4 GPL v3; etc.). > > There was also substantial work done towards a license "chooser" or > "wizard". However, this runs into some of the same problems - either the > chooser is opinionated (and so pisses off people, and potentially locks the > licenses in time) or is borderline-useless for newcomers (because it still > requires substantial additional research after using it). > > *Data-driven "popular" list* > > With all that in mind, I think that OSI needs a (mostly) data-driven > "popular" shortlist, based on a scan of public code + application of > (mostly?) objective rules to the outcome of that scan. > > To maintain OSI's reputation as being (reasonably) neutral and > independent, OSI should probably avoid basing this on third-party license > surveys (e.g., Black Duck > <https://www.blackducksoftware.com/top-open-source-licenses>) unless > their methodologies and data sources are well-documented. Ideally someone > will write code so that the "survey" can be run by OSI and reproduced by > others. > > Hard decisions on how to collect and "process" the data will include: > > - *choice of data sources:* What data sources are drawn on? Key Linux > distros? GitHub? per-language repos like maven, cpan, npm, etc? > - *what are you counting?* Projects? (May favor small, throwaway > projects?) Lines of code? (May favor the largest, most complex projects?) > ... ? > - *which license tools? *Some scanners are more aggressive in trying > to identify *something*, while others prefer accuracy over > comprehensiveness. In 2013 there was no good answer to this, but my > understanding is that fossology now has three different scanners, so for > OSI's purposes it may be sufficient to take those three and average. > - Could throw in Black Duck or other non-transparent surveys as a > fourth, fifth, etc.? > - *new versions? *If a new version exists but isn't widely adopted > yet, how does the list reflect that? e.g., MPL 1.1 still shows up in Black > Duck's survey; should OSI replace 1.1 with 2.0 in the "processed" list? > What about GPL v2 v. v3? BSD/MIT v. UPL? > - *gaps/"mistakes":* What happens when the board thinks the data is > incorrect? :) e.g., should ISC be listed? > > Part of why we didn't go very far in 2013 is because there are no great > answers for these - different answers will reflect different values, and > have different engineering impact. They're all hard choices for the board, > the developers, hopefully license-discuss, and perhaps a broader community. > > Hat tip: Daniel German was invaluable to me in thinking through these > questions. > > *Supplementing with high-quality, value-adding options* > To encourage progress, while still avoiding proliferation, I'd suggest a > second list of licenses that are good but not (yet?) popular. "Good" would > be defined as something like: > > 1. meets the OSD > 2. isn't on the data-driven popularity list > 3. drafted by an attorney (at minimum) or by a collaborative, public > drafting process with clear support from a sponsoring-maintaining > organization (ideal) > 4. has a new "feature" that is firmly in keeping with the overall > goals of open source and can be concisely explained in a few sentences > (e.g., for UPL, "GPL-compatible permissive license with explicit patent > grant") > 1. but not "just for a particular community" - has to be at least > plausible applicable to most open source projects > 2. this is unavoidably subjective; suggest having it fall to the > board with pre-discussion on license-review. > > #4 allows for some innovation (and OSI support of such innovation) while > #3 applies a quality filter. (Both #3 and #4 have anti-proliferation > effects.) Hopefully licenses that meet #3 and #4 would eventually move into > #2, but you could imagine placing a time limit on this list; if you're not > in the top 10 most popular within five years, then you get retired? But not > sure that's a good idea at all - just throwing it out as one option. > > If a new license meets #1, but not #3 and #4, then OSI's formal policy > should be to approve, but bury it in one of the other proliferation list > groups. (Those groups are actually quite good, and should be fairly > non-controversial — once you have a good policy for what gets in the more > "favored" groups.) I don't think a new "deprecated" group is necessary - > the proliferation categories are basically a good list of that already. > > This is still a somewhat subjective process, and if it had been in place > in '99-'06, it would have been fairly fraught. However, I think most of the > "action" in open source organization has moved on to other areas (e.g., > foundation structure, CoCs, etc.), and the field has matured in other ways, > so I think this is now a practicable approach in ways it would not have > been a decade or even five years ago. > > *Miscellaneous notes* > > - I don't recommend merely updating the existing "popular and..." list > through a subjective or one-time process. The politics of that will be > messy, and without a documented, mostly-objective, data-driven method, > it'll again become an outdated mess. > - The OSD should probably be updated. At the least this should be by > addressing things like whether a formal patent grant is required of new > licenses; more ambitiously it might follow Open Data Definition 2.x > <http://opendefinition.org/od/2.1/en/> by splitting out open licenses > from open works. > - With SPDX and Fedora providing more comprehensive lists of FOSS > licenses, it might make sense for OSI to link to those as "extended" > resources, to reduce pressure from obscure license authors to get their > license approved. > - The biggest pressure on this process will continue to be licenses > that try to open up space for new commercial business models (e.g., Fair > Source). The more OSI can write/document/buttress OSD #1, the better. > - I used to think a license wizard was a good idea, but I don't any > more. I thought copyleft spectrum was really the only important > decision-making factor, which made the idea plausible, but non-copyleft > factors matter much more than I once thought, and make simplifying to a > "wizard" too hard for OSI (though perhaps still plausible for a third > party). > - Documentation of what the copyleft spectrum *is*, what the key > licenses on it are, and what other factors might be relevant, is still a > good idea, but are secondary to getting the basic lists right. > > HTH- > > Luis > -- *Luis Villa: Open Law and Strategy <http://lu.is>* *+1-415-938-4552*
_______________________________________________ License-discuss mailing list License-discuss@opensource.org https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss