Guillem Jover writes ("Re: Bug#1130119: dpkg-source newly rejects unfinalised 
changelogs"):
> Because this was unintentional, I've prepared a minimal change to
> solve this for now, which I'll include in 1.23.8. Change attached.

Thanks.

> But

I wrote some point-by-point responses but I agree that we should talk
about the design questions properly.

I hope you will forgive a long and rather discursive reply.


I'm going to use the term "unfinalised" for the abstract concept of
changelog, or changelog-derived information, relating to a version
which is stitll in preparation.  Ie what in dgit.git is represented by
a changelog entry without a trailer line, and what dch likes to
represent by replacing the target with UNRELEASED.

Unfinalised changelogs have a variety of useful properties: they can
allow builds of pre-release packages to be distinguished (ideally, in
their version number); they can prevent accidental confusion between
unreleased and released software; indeed they can prevent
unintentional release.  I think this is common ground.

(In principle some of these purposes might be solved by use of git tag
information, without changing the tree.  Some other ecosystems and
upstreams work this way.  But git is not available everywhere in
Debian, git tags aren't reliably conveyed by git, and we generally
think that the output of a package build should depend only on the
tree and not on git objects/refs.  So I'm going to assume that we are
going to keep representing this information in-tree, in
debian/changelog.)

The unfinalisedness of the changelog starts out as text in the
changelog file.  It is then converted by various machinery (mostly in
dpkg-dev) to deb822 metadata in .changes etc.

We have two nearly separate questions to answer.

1. What syntax/format should we use in debian/changelog; and therefore
   what information should be present about the unreleased code.

2. What metadata values should be eppear at various stages.

In d/changelog, we *do* want the target suite, because that's a
property of the branch so ought to be changed explicitly.

We want *some* version number information; either "based on <previous
version>" or "prospective <new version>" would do.  Otherwise we can't
build sensible binaries from the in-development codebase.

Do we want any changed-by name/address?  We *could* name the person
doing the administrative work on the changelog.  Often, this is the
person who did the previous release - the new changelog entry is part
of routine post-release work.  Sometimes this is done by the first
person to make a post-release change.  But in neither case is this
really truthful.  Subsequent substantive changes will typically be
done by someone else.  ("Personal data shall be accurate".)

Taking a step back: what does changed-by really mean?  In a release,
it is always the person deciding upon and preparing the release.
(That might be different to a sponsor who signs it.)  Ie, the person
primarily responsible for the technical judgement that this code
should be released now.  That's how we use the changelog trailer in a
released version and it's how all the downstream tooling interprets
it.  It's *the human who made the release*.

An unreleased version *doesn't have a releaser*!  No human has made
the release because it's not a release.  Putting someone's name in
there to satisfy some tooling that demands that all code it works with
must have "someone who released it" is simply wrong.  The code *isn't
released* and programs that expects to be able to work with unreleased
software should not be demanding that some dummy human name is
inserted as the notional releaser.

Likewise, any date we put in the changelog is largely meaningless.
It certainly quickly gets out of date, in most workflows.  At least
there aren't data protection problems with including a wrong date, so
we could do that if we felt it convenient.

So I think an unfinalised debian/changelog file:

 - should *not* contain the information that is normally in tthe trailer;
 - *should* contain the target suite.

That is precisely what the traditional (Emacs mode, pre-dch)
unfinalised changelog format contains.


Now on to metadata (typically, in deb822 form).  I think we actually
have two kinds of systems that need to deal with changelog metadata.

1. Things that build source and binary packages, including local test
builds, Salsa CI, and so on;  2. software maintaining a repository of
releases (ftp.debian.org, dgit-repos, tag2upload).

For (1) we need to be able to deal with unreleased source code.  We
need to be able to build it, and lint it, and basically do everything
short of actually publishing it as part of a release.  (More
sophisticated CI systems might want to "fake up" a private release,
but that's ratther a different questiton and isn't the usual case.)

For (2), archive software we must reject them.  Existing software does
this by rejecting UNRELEASED but also by rejecting changelogs that
lack either a releaser (changed-by) or release date.


I think we can square this circle if we make dpkg-parsechangelog more
explicitly aware of the concept of an unfinalised changelog.

Let me a sketch a possibility.  I haven't thought through all the
details, and this is just some spitballing.  Anyway, my suggestion:

For a changelog with an empty trailer line, dpkg-parsechangelog would:

 * Use the current date as the release date.
 * Use the current version but add `~0unreleased~`
   unless the version already ends in `~`.
 * Use the specified target suite but add `-UNRELEASED`.

All existing downstream tooling would handle this very similarly to
the current UNRELEASED approach.  Most downstream tooling would not
need any special knowledge.  Even archive tooling would DTRT because
there is no such target suite as foo-UNRELEASED.

Local test binaries would end up with ~unreleased~ in their versions,
distinguishing them nicely from official binaries being uploaded (or rebuilds
of released source code).

Build reproducibility tests on unreleased code would need to set
SOURCE_DATE_EPOCH because the tree would no longer contain a dummy
date.  If it has a git tree, it could be set to the git committer
date.  (dpkg-parsechangelog could perhaps honour an environment
variable to do this or something).

We could eventually maybe change dch.


I think this kind of approach is consistent wih dpkg-parsechangelog's
notion that the changelog format is for humans and can be interpreted
by software to do the Right Thing.  (And even, that some humans might
have a different format, although I'm not sure that support still
works.)

I also don't think this is distro-specific.  The changes I propose to
the version and upload target will be widely applicable.


If you don't like the idea of dpkg-parsechangelog modifying its output
so much, an approach which would allow people to have correct data in
their d/changelog, but without making visible changes to downstream
tooling would be: if there is no trailer, use the current date; change
the target to UNRELEASED; and use fixed dummy values for the
changed-by.  This would amount ot emulating the UNRELEASED convention
in dpkg-parsechangelog.

I think this is worse because it *widens* the spread of wrong, dummy,
data.


I think I should write some specific responses.

>  One [drawback] is that the generated .changes does not contain the
> Changed-By field anymore (a field which is currently not marked as
> required, but I think that's wrong).

As I note above, the changed-by information in an unreleased package
is meaningless (and therefore wrong).  We should not use real personal
data for such a thing.

Indeed the UNRELEASED convention, which is the currently popular
approach to unfinalised changelogs, requires feeding *known invalid*
data to downstream tooling.  This is a hazaard, obviously.  Software
that doesn't know that UNRELEASED means "the other data in this
changelog entry may be a lie" can do the wrong thing.

I think this analysis demonstrates that it's bad to have picky tooling
that demands complete release information even for possibly unreleased
packages.  A the root is: tools that might need to handle unreleased
packages shouldn't demand a declaration that the package is relesed.

> It means we cannot enforce failures for invalid syntax for the
> changelog (as specified both by deb-changelog(5) and Debian Policy,
> "must" clauses), which means any code that has to deal with
> changelog files needs to be lenient and accept invalid stuff,
> complicating the overall ecosystem.

I don't think this is invalid syntax.  It is meaningful and has been
valid (and generated by the Emacs mode for Debian changelogs)
literally forever.  If some of the documentation doesn't mention this
syntax, we can update the docs.

(As an aside, we do probably want changelog parsing to be lenient at
least for historical data including both historical entries and old
source packages.)

> Because the final distribution is specified (instead of the
> convention of using UNRELEASED), such builds could end up being
> uploaded into a queue or a repo.

In fact, existing upload tools do reject this situation already even
without my proposal for adjusting the target suite, because they
insist on a changed-by and a date.

> For the argument that the date (or even maintainer) information is
> wrong, I'd say that easily applies as well to the version,

Versions are often prospective IME.

In our case, we adjust the version too.  Notice the ~ in the version
number.  Ideally tooling would add some ~ automatically.  See above.

> So I would like to consider how to move into eventually deprecating
> this (I'd say unusual and problematic) workflow by adding support to
> the common workflow by improving its metadata tracking (such as the
> intended dist, or perhaps reverse that and use the actual target dist
> but mark it as finalized=no, or whatever).

Well, if we're into redesigning this, I would like to suggest that the
best syntax for an unfinalised changelog is in fact the existing empty
trailer syntax as generated by Emacs debian-changelog-mode, for the
reasons I've explained.


Thanks for your attenttion.

Ian.

-- 
Ian Jackson <[email protected]>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

Reply via email to