Jonas Smedegaard <jo...@jones.dk> writes: > Strictly speaking it is not (as I was more narrowly focusing on) that > the current debian/copyright spec leaves room for *ambiguity*, but > instead that there is a real risk of making mistakes when replacing with > centrally defined ones (e.g. redefining a local "Expat" from locally > meaning "MIT-ish legalese as stated in this project" to falsely mean > "the MIT-ish legalese that SPDX labels MIT").
Right, the existing copyright format defines a few standard labels and says that you should only use those labels when the license text matches, but it doesn't stress that "matches" means absolutely word-for-word identical. I suspect, although I haven't checked, that we've made at least a few mistakes where some license text that's basically equivalent to Expat is labelled as Expat even though the text is not word-for-word identical. Given that currently all labels in debian/copyright are essentially local and the full text is there (except for common-licenses, where apart from BSD the licenses normally are used verbatim), this is not currently really a bug. But we could turn it into a bug quite quickly if we relied on the license short name to look up the text. To take an example that I've been trying to get rid of for over a decade, many of the /usr/share/common-licenses/BSD references currently in the archive are incorrect. There are a few cases where the code is literally copyrighted only by the Regents of the University of California and uses exactly that license text, but this is not the case for a lot of them. It looks like a few people have even tried to say "use common-licenses but change the name in the license" rather than reproducing the license text, which I don't believe meets the terms of the license (although it's of course very unlikely that anyone would sue over it). A quick code search turns up the following examples, all of which I believe are wrong: https://sources.debian.org/src/mrpt/1:2.10.0+ds-3/doc/man-pages/pod/simul-beacons.pod/?hl=35#L35 https://sources.debian.org/src/gridengine/8.1.9+dfsg-11/debian/scripts/init_cluster/?hl=7#L7 https://sources.debian.org/src/rust-hyphenation/0.7.1-1/debian/copyright/?hl=278#L278 https://sources.debian.org/src/nim/1.6.14-1/debian/copyright/?hl=64#L64 https://sources.debian.org/src/yade/2023.02a-2/debian/copyright/?hl=78#L78 An example of one that probably is okay, although ideally we still wouldn't do this because there are other copyrights in the source: https://sources.debian.org/src/lpr/1:2008.05.17.3+nmu1/debian/copyright/?hl=15#L15 This problem potentially would happen a lot with the BSD licenses, since the copyright-format document points to SPDX and SPDX, since it only cares about labeling legally-equivalent documents, allows the license text to vary around things like the name of the person you're not supposed to say endorsed your software while still receiving the same label. We therefore cannot use solely SPDX as a way of determining whether we can substitute the text of the license automatically for people, because there are SPDX labels for a lot of licenses for which we'd need to copy and paste the exact license text because it varies. At least if I understand what our goals would be. (License texts that have portions that vary between packages they apply to are a menace and make everything much harder, and I really wish people would stop using them, but of course the world of software development is not going to listen to me.) -- Russ Allbery (r...@debian.org) <https://www.eyrie.org/~eagle/>