Hi Osamu, It has been quite a while since we met in person in Canada and talkied about these things... :-)
Quoting Osamu Aoki (2026-01-07 03:01:26) > While reviewing recently closed #1121378 > https://bugs.debian.org/1121378 , I realized d/copyright generation > needs to be updated. > > As mentioned in #1121378, CC0-1.0 and MPL-1.1 and MPL-2.0 needs to be > addressed at least. > > While looking for the best practice example for CC0-1.0 using: > https://codesearch.debian.net/search?q=CC0-1.0 > I found glibc package includes SPDX reference > > https://sources.debian.org/src/glib2.0/2.86.3-4/debian/copyright?hl=1091#L1091 > > License: CC0-1.0 > SPDX license expression "CC0-1.0": https://spdx.org/licenses/CC0-1.0.html > On Debian systems, the complete text of the CC0 Public Domain Dedication > can be found in "/usr/share/common-licenses/CC0-1.0". > > I also saw: > > License: Expat > SPDX license expression "MIT": https://spdx.org/licenses/MIT.html > . > Permission is hereby granted, free of charge, to any person obtaining a copy > ... > > This style of text including SPDX reference is a nice one and the > updated output of debmake may follow this style. Please don't be creative with the License field (beside deciding on a shortname, spacing and punctuation). The License field should, according to the definition, should contain the licensing provided by the copyright-holder, *verbatim* - i.e. no creative additions about how that text happens to verbatim and/or semantically or legally-by-some-jurisdiction correlates to some SPDX or other license collection. The License field must contain only two things: a) a shortname and b) the contents of the licensing statement. Technically, the SPDX license is already referenced, just in a super compact format, in that the machine-readable copyright file format recommends to use SPDX shortnames. Please don't abuse well-defined fields by extending them with wrong information. Instead I see two options: Use the existing well-defined Comment field, or introduce a new field (yes, the format permits that, despite lintian complaining). If you choose to use a new field, then I dearly recommend to adopt the Reference field that I use in 700+ packages. That field holds a list of IRIs, either relative or absolute, for more information. I use the Reference field in License sections, when the copyright holder did not state licensing *content* but only *references* some general licensing text, which happens to be included in Debian below /sur/share/common-licenses and therefore is meaningful to only reference rather that include verbatim. NB! Lintian complains about this use, but I dare say that lintian is wrong here (see bug#786450). I also use the Reference field in Files sections, when copyright and/or licensing information is better covered somewhere else than within each individual file - either at an external web page (and what you talk about here sounds like *exactly* that), or some README file within the upstream project but separate from the files. A neat detail here is that for "boring" files without spaces or special punctuation, a relative file IRI is simply the file path. As a special case of above Files usage, I use the Reference field to point to the copyright file itself for canonical statements (i.e. when copyright and licensing of a section is *declared* in the copyright file rather than mirroring external statements). Most commonly, this is used to hint that debian/copyright/* is autoritative information. The reason I came up with this "twist" about canonical licensing is that the upstream project may choose to appreciate the work we've done curating licensing information, and they therefore store a copy of debian/copyright e.g. at contrib/copyright - which is nice but at the same time problematic: Their copy might either bitrot or diverge in other ways from the canonical debian/copyright file, and any Comment field in the debian/copyright file stating that "This is canonical information" would be true only for the original location but instantly become a lie for any copies of the file. Instead using the statement "Reference: debian/copyright" means different things for the original and any copied location, both true: Either it means "you are already looking at the reference, i.e. this information is canonical" or it means "check this other location for more info". > As recorded in Debain wiki: CopyrightReviewTools > https://wiki.debian.org/CopyrightReviewTools > there are many existing tools. Considering the core function of > debmake is generating template file for Debian packaging, if > possible, delegating Copyright Scanning Task to other program is one > option to keep this debmake maintainable. > > I consider licensecheck mostly by Jonas Smedegaard to be the leading > scanner. > https://tracker.debian.org/pkg/licensecheck > (Problem is it is in Perl which I don't use much.) > > Jonas has interesting discussion: > https://lists.debian.org/debian-devel/2019/12/msg00197.html (Mo Zhou) > https://lists.debian.org/debian-devel/2019/12/msg00207.html (Jonas) > > Since debmake and licensecheck scanner use different heuristics and > different focus on generated output, it may not be easy to swap out > current code with external call to licensecheck. (debmake has > extensive MIT/Expat license variant extraction to d/copyright.) > For now, it may be worth updating this debmake lc.py with minimal > changes. (I may just call licensecheck as external program in the > future.) As for the heuristics, I notice that debmake uses regular expressions, similar to licensecheck. Since a few years, most licensecheck regex patterns are (at least theoretically) no longer Perl-specific, as they have been separated out as a huge listing, which can be made accessable as a YAML file. Please do tell if that might be helpful for your tool in reusing patterns. As for the output, licensecheck can produce several different outputs. Please consider filing a bugreport against licensecheck, describing how licensecheck ideally should spit out its findings for *your* needs, and we can then discuss wether that is easy or hard, and wether the challenge is technical or political (e.g. if you want licensecheck to behave more sloppily then that is not in itself technically hard, but because I want sloppiness only *optionally* it might be harder to implement in licensecheck than directly in a sloppy-only tool). Even if you don't expect to embrace licensecheck, it would still be helpful that you file "imaginative" wishlist bugreports against licensecheck, as a way to share your experience and knowledge in this field, for the potential benefit of *other licensecheck users. :-) Kind regards, and thanks for putting me in the loop, - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ * Sponsorship: https://ko-fi.com/drjones [x] quote me freely [ ] ask before reusing [ ] keep private
signature.asc
Description: signature

