Hi! On Tue, 2024-12-10 at 15:13:03 +0100, Raphaël Hertzog wrote: > Package: dpkg-dev > Version: 1.22.11 > Severity: normal > X-Debbugs-Cc: [email protected]
> While maintaining tracker.debian.org, I started to get failures about > invalid "Maintainer" fields. > > There's a clear violation with a Maintainer field with two maintainers: > https://bugs.debian.org/1076048 > Maintainer: Steve Langasek <[email protected]>, Michael Vogt > <[email protected]> > > But we also have many cases where there's a trailing comma: > > Maintainer: Debian Security Team <[email protected]>, > Maintainer: Daniel Baumann <[email protected]>, > > And yet nothing complained about this (neither dpkg, nor lintian, nor > dak). dpkg-source and dpkg-gencontrol happily copied the invalid data. Right, these are one of several fields where dpkg tools do not really parse or normalize the values, but I've been considering that a misfeature, as then we get this kind of output, where all consumers then need to handle such unexpected/wrong values. > After discussion on #debian-qa, we believe that the toolchain should strip > the trailing comma to bring the field back into compliance. Much like it > will clean up commas in dependencies. I think that would be fine, but for Maintainers only during a transitional period and not as a long-term supported feature, because this is not like a dependency field where you have multiple values, and a trailing comma makes sense when placing them each on their own line, or to handle empty substvars, or similar. For Uploaders it makes sense to always strip them given that it is a comma-separated list (and where «wrap-and-sort -sat» seems the best format). For commas in the middle, I think that will currently need some more consideration (see below). > But when that is not sufficient, it probably makes sense to fail and > report the problem? There's a single case that would be broken right now. > But a dozen of packages with trailing commas. I was checking the state of the archive, slightly after this got filed, and the problem seems worse to me: ,--- $ grep-deb-sources -e -sPackage,Maintainer,Extra-Source-Only \ -FMaintainer '.*,' | grep ^Package: | wc -l 49 `--- ,--- $ grep-deb-sources -e -sPackage,Maintainer,Uploaders,Extra-Source-Only \ -FUploaders '.*,$' | grep ^Package: | wc -l 6574 `--- In addition to the already reported golang-github-mvo5-goconfigparser, another one of those has a comma in the middle in the Maintainer field, but is an Extra-Source-Only:yes source package: Package: darts Maintainer: Natural Language Processing, Japanese <[email protected]> Extra-Source-Only: yes Where the version in unstable looks fine. I then started writing a parser for these fields, went checking for what would be the allowed documented syntax, and ended up in the same rabbit hole as the following Debian Policy bug reports #401452, #509935 and #962277. I do think we should clarify the syntax first, and IMO we should go for the simplest possible syntax probably based on the RFCs avoiding all obsolete constructs, but allowing for the currently used names which include a comma, so to me that means supporting quoted names, which we already have in debian/changelog trailers, and would need to keep supporting to be able to parse old entries anyway, and which we need to match against the Maintainer fields. I'll try to write a parser for the above, and throw it against the archive Sources indices, and see what can be warned on, and what errored out directly, then probably update the Debian Policy bug reports. Thanks, Guillem

