On Tue, Nov 15, 2011 at 07:46:54AM +0100, "Martin v. Löwis" wrote: > > 1) The other licenses which have versions attached to them do not place the > > version into a fourth level > > That's probably because nobody thought of it. > <nod> But consistency in the classifiers is a nice thing. If your goal as a consumer of the data is really trying to isolate the version from the rest of the license name, for instance, you already have to parse the version off the end of some strings. Making new classifiers that use another level for a version means you have to maintain parsing of the version as a separate field as an additional case.
Note that one of those other licenses is a GNU license: License :: OSI Approved :: GNU Affero General Public License v3 > > 2) The utility of searching like that is limited. > > Why do you say that? You have full search capabilities either way. In > fact, sub-classifiers improve the search capabilities. > I say this because in this case the different versions are simply ways of naming different licenses. The licenses have different terms and conditions and in and of themselves are not even compatible. The value of categorizing the GPLv2 and GPLv3 license together is rather slim. The value of categorizing the MIT and new BSD licenses together would be higher than the value of categorizing the GPLv2 and GPLv3 licenses together. Wishing to categorize the GPLv2 and GPLv3 together is only a superficial goal based on the fact that they share the common substring "GNU General Public License" in their name. > > If I'm searching for > > particular licenses, it's typically because I need to know whether the > > license is compatible with some other license. > > I'm not really sure what the common case for searching for a license is. > One reason might also be that people want to know what the most popular > licenses are, and would there want to aggregate GPL (any version). > Commonly people would not want to aggregate the GPL licenses here because the GPL licenses are incompatible with each other and have different terms and conditions. If you want to know about popular licenses, you would want to keep the GPLv2 and GPLv3 licenses separate in your count. Or put another way, if you are counting the GPLv2 and GPLv3 together, you likely aren't really looking for a count of popular licenses. You're likely looking for a count of types of licenses. These are some of the ways that people might like to categorize licenses: * Copyleft, non-copyleft, proprietary * GPLv2-compatible, GPLv3-compatible, non-GPL compatible open source, proprietary * Backed by a legal team, analyzed by a legal team, not rigorously analyzed These are somewhat helped by having the version separated but they all have issues as separating the version portion of the license name from the rest is only an imperfect match for what you really want. In the end, you have to maintain lists of licenses that meet your categorizing criteria. Having the version as a separate field doesn't help as the textual portion of the license name and the version portion have to be considered together to determine which terms and condions need to be evaluated in your context. The only thing that the raw name without version really brings to the table is brand identification. Here's a different example of this -- Let's say we were designing categories for people who might want to classify software by which programming language they were written in. Would we want to group perl, python, and php together because someone might want to search out popularity of languages according to which begin with "p"? Do we want to group Visual Basic and C# together because both originated with Microsoft? These groupings may fit with what someone wants to accomplish but they're superficial groupings based on things that have nothing to do with what the subject matter is actually about. Similarly, grouping the licenses based on the existence of a common substring in the name is basing it on a superficial characteristic of the license. Instead, determine which attributes of the license are important and you want to optimize for then optimize for that. > But let's assume that people actually search for licenses to find > software that is compatible with their needs (whether that is their > license, their company policies, or their personal preferences). > > > The GPL v2 and versoin 3 licenses are not compatible with each other. > > Then you search for either one by subclassifier (as you would for > a flat classification). However, there are also cases where the > license in question is compatible with both GPL versions, so you would > want to search for GPL "any version". > This is lawyerly-debatable. Since the GPLv2 and GPLv3 are incompatible and since they are strong copylefts, the FSF's position has been that you cannot license code in such a way that it can use both GPLv2 and GPLv3 licensed code. This has not been tested in court yet so lawyers continue to debate the validity of this and the applicability in different situations (for instance, is dynamic linkage different from shared? Are scripting languages different than compiled?) but anyone who doesn't want to risk having to go to court over this needs to keep it in mind. > > With this in mind, it seemed like code which used the trove license > > categories would need to operate on each license+version independently, > > even if we grouped them that way in the categorization scheme. > > In the use cases you cited. I think there are also use cases where you > would want to entire supercategory. > I could see *other* supercategories which could be of benefit but I don't see that there is a case to be made for this particular separation. A license's version is a part of its name as the terms and conditions between versions can change dramatically (and in the case of GPLv2 and GPLv3, LGPLv2 and LGPLv3; they did). Here's some examples of other supercategories that could be considered: ::FSF :: GNU General Public License v3 :: Apache :: Apache Software License v2 This scheme would highlight the body that created a license. Not all licenses have a traceable or well known originator, though. But for the ones that do, this helps to answer the question of what legal team created it or stands behind it/can explain the intent when it was drafed. :: GNU Lesser General Public License v2 (LGPLv2) :: 2.0 :: GNU Lesser General Public License v2 (LGPLv2) :: 2.1 :: GNU Lesser General Public License v3 (LGPLv3) :: 3.0 This scheme would highlight when minor changes are made vs incompatible changes. This particular example is problematic, however, because the particular change incorporated between v2.0 and v2.1 was a rename. The 2.0 version is actually the GNU Library General Public License. The concept is problematic as deciding what's an incompatible change and what isn't is debatable. In the GPL context, version 2 and version 3 are incompatible with each other so it's clear that they are different licenses. On the other hand, you have licenses like Old BSD (aka BSD with advertising) versus the New BSD license. The two are compatible with each other and the common name for each is the same but externally, the new BSD license is compatible with the GPL while the old one is not which is a major difference. Talking about supercategories for licenses, though, points me in the direction that really we're talking about wanting to add attributes to describe the license itself, not attributes to describe the software. That seems out of scope for the trove categories being used on pypi. (pypi already sorta does this with the OSI Approved supercategory marking the licenses as open source... but I'm wondering if that wasn't a mistake. After all, the FSF maintains a similar list of licenses and the two lists are not super/subsets of each other.) Limiting it to the license name (of which the version is a part as the version + textual portion of the name together reference the set of terms and conditions which make up the license) might be better. -Toshio
pgpDt7jUmCTpT.pgp
Description: PGP signature
_______________________________________________ Catalog-SIG mailing list Catalog-SIG@python.org http://mail.python.org/mailman/listinfo/catalog-sig