Package: src:dpkg
Severity: wishlist
X-Debbugs-Cc: [email protected]

Hi Guillem,

I wanted to continue the design discussion about a potential new
feature in dpkg to record the git commit id and git tree id in the
build artifacts. You already provided a "shallow pass" in
https://lists.debian.org/debian-devel/2025/12/msg00150.html and
https://lists.debian.org/debian-devel/2025/12/msg00152.html

After thinking about this, I'd like to propose the following
high-level design as what I as a dpkg user would find most useful and
conceptually clearest for both packagers and supply-chain verifiers.

## New records in build artifacts for git commit id and git tree id

The goal would be to ensure that both source and binary builds record
somewhere what git commit id (git rev-parse HEAD) and what git tree id
(git rev-parse HEAD^{tree}) was used for the build.

It could be, for example, in the .buildinfo file and look something like this:

$ cat btop_1.4.5-1_amd64.buildinfo
Format: 1.0
Source: btop
Binary: btop
Architecture: amd64
Version: 1.4.5-1
Checksums-Md5:
149c7e4ae56bb45f73b978770653a2b4 608400 btop_1.4.5-1_amd64.deb
Checksums-Sha1:
aec3e80199084c15d3d4b172d335b0a1f908fd04 608400 btop_1.4.5-1_amd64.deb
Checksums-Sha256:
ce8d0383e409cf135dd5da861ab51d3ead80d317ed3a91ad9f94f47f52ca4ef1
608400 btop_1.4.5-1_amd64.deb
Git-Commit-Id: aeb7f8fa34686a97e0119e59ab9972a1253c45c1
Git-Tree-Id: 6b1d2be9c89871763a9aa9944a473ee885a09282
Build-Origin: Debian
Build-Architecture: amd64

The .buildinfo files are produced by both source and binary builds.
Another option is the .changes file which is also produced by both
source and binary builds, but I would prefer the .buildinfo file, as
modern systems archive that file and it is used among others by
reproducible builds when testing if the original binary can be
reproduced, so having git commit id there feels like the same category
of information and likely useful to the same set of systems consuming
.buildinfo files.

In addition to the .buildinfo file, the information could also be
recorded in .dsc files produced by source builds, and for example look
like this:

$ cat btop_1.4.5-1.dsc
Format: 3.0 (quilt)
Source: btop
Binary: btop
Architecture: linux-any
Version: 1.4.5-1
Maintainer: Otto Kekäläinen <[email protected]>
Homepage: https://github.com/aristocratos/btop
Standards-Version: 4.7.2
Vcs-Browser: https://salsa.debian.org/otto/btop
Vcs-Git: https://salsa.debian.org/otto/btop.git
Git-Commit-Id: aeb7f8fa34686a97e0119e59ab9972a1253c45c1
Git-Tree-Id: 6b1d2be9c89871763a9aa9944a473ee885a09282
Build-Depends: debhelper-compat (= 13), lowdown
Package-List:
 btop deb utils optional arch=linux-any
Checksums-Sha1:
 c824c8910994b06af7d32fb504415869191cf9ef 1250099 btop_1.4.5.orig.tar.gz
 4ba3948ae58ee2191ca9228596480888c9c09b87 6492 btop_1.4.5-1.debian.tar.xz
Checksums-Sha256:
 0ffe03d3e26a3e9bbfd5375adf34934137757994f297d6b699a46edd43c3fc02
1250099 btop_1.4.5.orig.tar.gz
 cfac4693ad56549885980adbe1a8660d63f3ba2c75c52eb655af7e2362e616fa 6492
btop_1.4.5-1.debian.tar.xz
Files:
 01d908025464b3399075c43c3b8e0fee 1250099 btop_1.4.5.orig.tar.gz
 ae418695318779673580c675ee6c16e6 6492 btop_1.4.5-1.debian.tar.xz

I intentionally placed it after the existing Vcs-Git field as the
information is related (although not necessarily linked).

## Dpkg's role is just to record, not to verify

This leads to the discussion of what this field actually means. It
should record what was the git state for a dpkg-buildpackage run if
the build ran in a git repository, and the repository was clean (git
status --ignored shows no changes).

If there is no git repository, the fields should silently be omitted.
If there is a git repository but it is not clean, dpkg-buildpackage
could emit a warning but not do anything else. There could be a new
parameter --git-clean that makes the warning into an error and refuses
to proceed with the build unless git was clean, or the parameter could
just run the low-level equivalent of git reset --hard; git clean -fdx,
although this might be a bit dangerous as the build run could happen
inside a monorepo or in a home directory that is fully tracked in git
and resetting everything could end up doing unwanted deletion of
files.

The command dpkg-buildpackage does not have to guarantee that the git
commit id and git tree id are correct. Anyone running
dpkg-buildpackage and having full write access to all inputs and
outputs can anyway manipulate whatever part they want. It should be
enough that dpkg-buildpackage automates recording these two git data
points in a way that makes it hard or impossible for anyone to
accidentally have them wrong. A mismatch in the values of these fields
should be solid evidence that someone intentionally manipulated the
inputs, outputs or the build process, and nobody should be able to
claim that they just didn't know what they were doing or that they ran
the wrong command by accident.

## Relation to the Vcs-Git field

These two fields are obviously related to the existing Vcs-Git field
that may be present in packages. It is important to note that
dpkg-buildpackage has no way of correlating whether the git repository
has any relation to what git repository is defined in the
debian/control:Vcs-Git field. Whoever is verifying the git commit id
and git tree id may find the corresponding git commit id or tree id in
the very repository that the Vcs-Git field points to, but if they are
not found there, the verifier needs to do further work, which is
entirely context dependent.

When verifying a package in official Debian repositories it might be
that the Vcs-Git field is simply outdated (e.g. points to
https://anonscm.debian.org/git/..) or that the maintainer forgot to
push local commits, or pushed the wrong commits. When verifying a
package in Ubuntu it might be that the package was modified in Ubuntu
and only the Maintainer: field was updated to point to the Ubuntu
version but not the Vcs-Git field. When verifying a 3rd party or
company internal package there might be other issues. If the package
was built from a monorepo there are a whole other set of things to
consider. This paragraph was just to illustrate that the verification
is outside of dpkg's concerns. Dpkg just needs to record the git
commit and tree id that it saw when dpkg-buildpackage ran.

## Why also git tree id?

Using git commit ids probably doesn't require an explanation as all
software developers are familiar with them. However the benefit of the
git tree id might not be obvious to everyone reading this proposal.

A git tree id is based on the file contents and attributes in a git
repository. If git commits are reordered, squashed or rebased the git
commit ids will change, but as long as file contents are still the
same, the resulting files will have identical contents, and the git
tree id will be the same.

If workflows are clear the git commit ids should always match, but if
the git commit id does not match, and a comparison using the git tree
id still shows a match, that is enough for a verifier to prove that
the package contents likely haven't been manipulated but only git
commits in a way that didn't affect the end result.

## Native vs non-native Debian package

As the fields simply record what was the git commit id and tree id at
the time of the build, if done inside a git repository and it was
clean, I don't immediately see any reason to differentiate the
behavior based on package type. Keeping it uniform and simple is
probably best.

However, these can't be mixed. What is tracked in git must be either a
native package with modifications anywhere, or a non-native package
with patches unapplied and no modifications outside the debian/
subdirectory.

## Additional fields for upstream git status?

The proposed recording of git commit id and tree id are specifically
for the repository the Debian source or binary build was done from. A
build of a non-native package will be using an orig tarball as the
upstream sources, or at least requires that the contents in the build
directory excluding the debian/ subdirectory matches what is in an
orig tarball.

It is however common that the contents of the orig tarball is derived
from the very same Debian packaging git repository. In repositories
following DEP-14 the orig tarball can be produced from the branch
upstream/latest (and additionally the tarball might be modified with
additional info from the pristine-tar branch).

We might also want to consider tracking what was the git commit id and
git tree id when dpkg-source --build ran, and store it somewhere along
the generated orig tarball, perhaps with field names such as
Orig-Git-Commit-Id and Orig-Git-Tree-Id, and eventually pass them
along so they too get recorded into the .buildinfo and .dsc files.

These extra fields are fully optional and not needed for the primary
feature request here, but I included a description of them in case it
helps grasping the scope and meaning of the main Git-Commit-Id and
Git-Tree-Id fields.


Sorry for the long email and I understand if you don't have time to
immediately respond, but hopefully you can at some point share what
you think about this, as the feature will never go anywhere unless you
see the merits of it and agree it would make sense.

- Otto

PS. I also included Daniel as a recipient as he tends to have good
insights on the topic and is likely able to give productive
suggestions on how to make this initial idea better.

Reply via email to