Package: src:dpkg Severity: wishlist X-Debbugs-Cc: [email protected] Hi Guillem,
I wanted to continue the design discussion about a potential new feature in dpkg to record the git commit id and git tree id in the build artifacts. You already provided a "shallow pass" in https://lists.debian.org/debian-devel/2025/12/msg00150.html and https://lists.debian.org/debian-devel/2025/12/msg00152.html After thinking about this, I'd like to propose the following high-level design as what I as a dpkg user would find most useful and conceptually clearest for both packagers and supply-chain verifiers. ## New records in build artifacts for git commit id and git tree id The goal would be to ensure that both source and binary builds record somewhere what git commit id (git rev-parse HEAD) and what git tree id (git rev-parse HEAD^{tree}) was used for the build. It could be, for example, in the .buildinfo file and look something like this: $ cat btop_1.4.5-1_amd64.buildinfo Format: 1.0 Source: btop Binary: btop Architecture: amd64 Version: 1.4.5-1 Checksums-Md5: 149c7e4ae56bb45f73b978770653a2b4 608400 btop_1.4.5-1_amd64.deb Checksums-Sha1: aec3e80199084c15d3d4b172d335b0a1f908fd04 608400 btop_1.4.5-1_amd64.deb Checksums-Sha256: ce8d0383e409cf135dd5da861ab51d3ead80d317ed3a91ad9f94f47f52ca4ef1 608400 btop_1.4.5-1_amd64.deb Git-Commit-Id: aeb7f8fa34686a97e0119e59ab9972a1253c45c1 Git-Tree-Id: 6b1d2be9c89871763a9aa9944a473ee885a09282 Build-Origin: Debian Build-Architecture: amd64 The .buildinfo files are produced by both source and binary builds. Another option is the .changes file which is also produced by both source and binary builds, but I would prefer the .buildinfo file, as modern systems archive that file and it is used among others by reproducible builds when testing if the original binary can be reproduced, so having git commit id there feels like the same category of information and likely useful to the same set of systems consuming .buildinfo files. In addition to the .buildinfo file, the information could also be recorded in .dsc files produced by source builds, and for example look like this: $ cat btop_1.4.5-1.dsc Format: 3.0 (quilt) Source: btop Binary: btop Architecture: linux-any Version: 1.4.5-1 Maintainer: Otto Kekäläinen <[email protected]> Homepage: https://github.com/aristocratos/btop Standards-Version: 4.7.2 Vcs-Browser: https://salsa.debian.org/otto/btop Vcs-Git: https://salsa.debian.org/otto/btop.git Git-Commit-Id: aeb7f8fa34686a97e0119e59ab9972a1253c45c1 Git-Tree-Id: 6b1d2be9c89871763a9aa9944a473ee885a09282 Build-Depends: debhelper-compat (= 13), lowdown Package-List: btop deb utils optional arch=linux-any Checksums-Sha1: c824c8910994b06af7d32fb504415869191cf9ef 1250099 btop_1.4.5.orig.tar.gz 4ba3948ae58ee2191ca9228596480888c9c09b87 6492 btop_1.4.5-1.debian.tar.xz Checksums-Sha256: 0ffe03d3e26a3e9bbfd5375adf34934137757994f297d6b699a46edd43c3fc02 1250099 btop_1.4.5.orig.tar.gz cfac4693ad56549885980adbe1a8660d63f3ba2c75c52eb655af7e2362e616fa 6492 btop_1.4.5-1.debian.tar.xz Files: 01d908025464b3399075c43c3b8e0fee 1250099 btop_1.4.5.orig.tar.gz ae418695318779673580c675ee6c16e6 6492 btop_1.4.5-1.debian.tar.xz I intentionally placed it after the existing Vcs-Git field as the information is related (although not necessarily linked). ## Dpkg's role is just to record, not to verify This leads to the discussion of what this field actually means. It should record what was the git state for a dpkg-buildpackage run if the build ran in a git repository, and the repository was clean (git status --ignored shows no changes). If there is no git repository, the fields should silently be omitted. If there is a git repository but it is not clean, dpkg-buildpackage could emit a warning but not do anything else. There could be a new parameter --git-clean that makes the warning into an error and refuses to proceed with the build unless git was clean, or the parameter could just run the low-level equivalent of git reset --hard; git clean -fdx, although this might be a bit dangerous as the build run could happen inside a monorepo or in a home directory that is fully tracked in git and resetting everything could end up doing unwanted deletion of files. The command dpkg-buildpackage does not have to guarantee that the git commit id and git tree id are correct. Anyone running dpkg-buildpackage and having full write access to all inputs and outputs can anyway manipulate whatever part they want. It should be enough that dpkg-buildpackage automates recording these two git data points in a way that makes it hard or impossible for anyone to accidentally have them wrong. A mismatch in the values of these fields should be solid evidence that someone intentionally manipulated the inputs, outputs or the build process, and nobody should be able to claim that they just didn't know what they were doing or that they ran the wrong command by accident. ## Relation to the Vcs-Git field These two fields are obviously related to the existing Vcs-Git field that may be present in packages. It is important to note that dpkg-buildpackage has no way of correlating whether the git repository has any relation to what git repository is defined in the debian/control:Vcs-Git field. Whoever is verifying the git commit id and git tree id may find the corresponding git commit id or tree id in the very repository that the Vcs-Git field points to, but if they are not found there, the verifier needs to do further work, which is entirely context dependent. When verifying a package in official Debian repositories it might be that the Vcs-Git field is simply outdated (e.g. points to https://anonscm.debian.org/git/..) or that the maintainer forgot to push local commits, or pushed the wrong commits. When verifying a package in Ubuntu it might be that the package was modified in Ubuntu and only the Maintainer: field was updated to point to the Ubuntu version but not the Vcs-Git field. When verifying a 3rd party or company internal package there might be other issues. If the package was built from a monorepo there are a whole other set of things to consider. This paragraph was just to illustrate that the verification is outside of dpkg's concerns. Dpkg just needs to record the git commit and tree id that it saw when dpkg-buildpackage ran. ## Why also git tree id? Using git commit ids probably doesn't require an explanation as all software developers are familiar with them. However the benefit of the git tree id might not be obvious to everyone reading this proposal. A git tree id is based on the file contents and attributes in a git repository. If git commits are reordered, squashed or rebased the git commit ids will change, but as long as file contents are still the same, the resulting files will have identical contents, and the git tree id will be the same. If workflows are clear the git commit ids should always match, but if the git commit id does not match, and a comparison using the git tree id still shows a match, that is enough for a verifier to prove that the package contents likely haven't been manipulated but only git commits in a way that didn't affect the end result. ## Native vs non-native Debian package As the fields simply record what was the git commit id and tree id at the time of the build, if done inside a git repository and it was clean, I don't immediately see any reason to differentiate the behavior based on package type. Keeping it uniform and simple is probably best. However, these can't be mixed. What is tracked in git must be either a native package with modifications anywhere, or a non-native package with patches unapplied and no modifications outside the debian/ subdirectory. ## Additional fields for upstream git status? The proposed recording of git commit id and tree id are specifically for the repository the Debian source or binary build was done from. A build of a non-native package will be using an orig tarball as the upstream sources, or at least requires that the contents in the build directory excluding the debian/ subdirectory matches what is in an orig tarball. It is however common that the contents of the orig tarball is derived from the very same Debian packaging git repository. In repositories following DEP-14 the orig tarball can be produced from the branch upstream/latest (and additionally the tarball might be modified with additional info from the pristine-tar branch). We might also want to consider tracking what was the git commit id and git tree id when dpkg-source --build ran, and store it somewhere along the generated orig tarball, perhaps with field names such as Orig-Git-Commit-Id and Orig-Git-Tree-Id, and eventually pass them along so they too get recorded into the .buildinfo and .dsc files. These extra fields are fully optional and not needed for the primary feature request here, but I included a description of them in case it helps grasping the scope and meaning of the main Git-Commit-Id and Git-Tree-Id fields. Sorry for the long email and I understand if you don't have time to immediately respond, but hopefully you can at some point share what you think about this, as the feature will never go anywhere unless you see the merits of it and agree it would make sense. - Otto PS. I also included Daniel as a recipient as he tends to have good insights on the topic and is likely able to give productive suggestions on how to make this initial idea better.

