Hi Osamu,

It has been quite a while since we met in person in Canada and talkied
about these things... :-)

Quoting Osamu Aoki (2026-01-07 03:01:26)
> While reviewing recently closed #1121378
> https://bugs.debian.org/1121378 , I realized d/copyright generation
> needs to be updated.
> 
> As mentioned in #1121378, CC0-1.0 and MPL-1.1 and MPL-2.0 needs to be
> addressed at least.
> 
> While looking for the best practice example for CC0-1.0 using:
>  https://codesearch.debian.net/search?q=CC0-1.0
> I found glibc package includes SPDX reference
>  
> https://sources.debian.org/src/glib2.0/2.86.3-4/debian/copyright?hl=1091#L1091
> 
> License: CC0-1.0
>  SPDX license expression "CC0-1.0": https://spdx.org/licenses/CC0-1.0.html
>  On Debian systems, the complete text of the CC0 Public Domain Dedication
>  can be found in "/usr/share/common-licenses/CC0-1.0".
> 
> I also saw:
> 
> License: Expat
>  SPDX license expression "MIT": https://spdx.org/licenses/MIT.html
>  .
>  Permission is hereby granted, free of charge, to any person obtaining a copy
> ...
> 
> This style of text including SPDX reference is a nice one and the
> updated output of debmake may follow this style.

Please don't be creative with the License field (beside deciding on a
shortname, spacing and punctuation).

The License field should, according to the definition, should contain
the licensing provided by the copyright-holder, *verbatim* - i.e. no
creative additions about how that text happens to verbatim and/or
semantically or legally-by-some-jurisdiction correlates to some SPDX or
other license collection. The License field must contain only two
things: a) a shortname and b) the contents of the licensing statement.

Technically, the SPDX license is already referenced, just in a super
compact format, in that the machine-readable copyright file format
recommends to use SPDX shortnames. Please don't abuse well-defined
fields by extending them with wrong information. Instead I see two
options: Use the existing well-defined Comment field, or introduce a
new field (yes, the format permits that, despite lintian complaining).

If you choose to use a new field, then I dearly recommend to adopt the
Reference field that I use in 700+ packages. That field holds a list of
IRIs, either relative or absolute, for more information.

I use the Reference field in License sections, when the copyright
holder did not state licensing *content* but only *references* some
general licensing text, which happens to be included in Debian below
/sur/share/common-licenses and therefore is meaningful to only
reference rather that include verbatim. NB! Lintian complains about
this use, but I dare say that lintian is wrong here (see bug#786450).

I also use the Reference field in Files sections, when copyright and/or
licensing information is better covered somewhere else than within each
individual file - either at an external web page (and what you talk
about here sounds like *exactly* that), or some README file within the
upstream project but separate from the files. A neat detail here is
that for "boring" files without spaces or special punctuation, a
relative file IRI is simply the file path.

As a special case of above Files usage, I use the Reference field to
point to the copyright file itself for canonical statements (i.e. when
copyright and licensing of a section is *declared* in the copyright
file rather than mirroring external statements). Most commonly, this is
used to hint that debian/copyright/* is autoritative information. The
reason I came up with this "twist" about canonical licensing is that
the upstream project may choose to appreciate the work we've done
curating licensing information, and they therefore store a copy of
debian/copyright e.g. at contrib/copyright - which is nice but at the
same time problematic: Their copy might either bitrot or diverge in
other ways from the canonical debian/copyright file, and any Comment
field in the debian/copyright file stating that "This is canonical
information" would be true only for the original location but instantly
become a lie for any copies of the file. Instead using the statement
"Reference: debian/copyright" means different things for the original
and any copied location, both true: Either it means "you are already
looking at the reference, i.e. this information is canonical" or it
means "check this other location for more info".


> As recorded in Debain wiki: CopyrightReviewTools
> https://wiki.debian.org/CopyrightReviewTools
> there are many existing tools.  Considering the core function of
> debmake is generating template file for Debian packaging, if
> possible, delegating Copyright Scanning Task to other program is one
> option to keep this debmake maintainable. 
> 
> I consider licensecheck mostly by Jonas Smedegaard to be the leading
> scanner.
>   https://tracker.debian.org/pkg/licensecheck
> (Problem is it is in Perl which I don't use much.)
> 
> Jonas has interesting discussion:
>   https://lists.debian.org/debian-devel/2019/12/msg00197.html (Mo Zhou)
>   https://lists.debian.org/debian-devel/2019/12/msg00207.html (Jonas)
> 
> Since debmake and licensecheck scanner use different heuristics and
> different focus on generated output, it may not be easy to swap out
> current code with external call to licensecheck.  (debmake has
> extensive MIT/Expat license variant extraction to d/copyright.)
> For now, it may be worth updating this debmake lc.py with minimal
> changes.  (I may just call licensecheck as external program in the
> future.)

As for the heuristics, I notice that debmake uses regular expressions,
similar to licensecheck. Since a few years, most licensecheck regex
patterns are (at least theoretically) no longer Perl-specific, as they
have been separated out as a huge listing, which can be made accessable
as a YAML file. Please do tell if that might be helpful for your tool
in reusing patterns.

As for the output, licensecheck can produce several different outputs.
Please consider filing a bugreport against licensecheck, describing how
licensecheck ideally should spit out its findings for *your* needs, and
we can then discuss wether that is easy or hard, and wether the
challenge is technical or political (e.g. if you want licensecheck to
behave more sloppily then that is not in itself technically hard, but
because I want sloppiness only *optionally* it might be harder to
implement in licensecheck than directly in a sloppy-only tool).

Even if you don't expect to embrace licensecheck, it would still be
helpful that you file "imaginative" wishlist bugreports against
licensecheck, as a way to share your experience and knowledge in this
field, for the potential benefit of *other licensecheck users. :-)

Kind regards, and thanks for putting me in the loop,

- Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/
 * Sponsorship: https://ko-fi.com/drjones

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

Attachment: signature.asc
Description: signature

Reply via email to