On 2023-01-15 05:21, Bruno Haible wrote:
Reproducibility is about verifying that an artifact A was generated
from a source S.
Quite true. However, there's something else going on: when I do an 'ls
-l' of a source directory that I got from a distribution tarball, it's
useful to see the last time the contents of each source file was changed
upstream. When sources are in a Git repository, I've found the commit
timestamp to be a good representation for that.
For TZDB, where users have long wanted reproducibility, I use something
like this in a Makefile recipe for each source file $$file:
time=`git log -1 --format='tformat:%ct' $$file` &&
touch -cmd @$$time $$file
Here are three problems I ran into with this approach, and the solutions
that TZDB uses:
1. As you mentioned, what if you're building a release from sources that
have not yet been committed? In this case TZDB's Makefile recipe warns
but goes ahead with the timestamp that the working file already has.
2. What about platform-independent files that are automatically created
from source files from the repository, and that are shipped in the
release tarball? In this case, the TZDB Makefile arranges for each such
file to have a timestamp one second later than the maximum of timestamps
of files that the file depends on. This step is the biggest hassle,
since it means I need to repeat in the Makefile the logic that 'make'
already uses when calculating dependencies.
3. What about tarball metadata other than last-modified time? Here, TZDB
uses the following GNU Tar options:
GNUTARFLAGS= --format=pax --pax-option='delete=atime,delete=ctime' \
--numeric-owner --owner=0 --group=0 \
--mode=go+u,go-w --sort=name
The need for most of this should be obvious, if one wants the tarball to
be reproducible. However, some details are less obvious. GNUTARFLAGS
specifies pax format because the default GNU Tar format becomes
unportable after 2242-03-16 12:56:32 UTC due to the 33-bit limitation of
ustar. And GNUTARFLAGS uses delete=atime,delete=ctime so that atime and
ctime do not leak into the tarball and make it less reproducible; since
mtime values are always a multiple of 1 second (given steps 1 and 2)
this means the tarball will be ustar-compatible until 2242, giving users
*plenty* of time to prepare for pax format timestamps.
There is an argument that we need not have a fancy GNUTARFLAGS like
this, because I'm signing the tarballs and users have to trust me
anyway. Still, some users want to "trust but verify" and a reproducible
tarball is easier to audit than a non-reproducible one, so for these
users it can be a win to omit the irrelevant data from the tarball.