"Theodore Ts'o" <ty...@mit.edu> writes:

>> 1) Use upstream's PGP signed git-archive tarball.
>
> Here's how I do it in e2fsprogs which (a) makes the git-archive
> tarball be bit-for-bit reproducible given a particular git commit ID,
> and (b) minimizes the size of the tarball when stored using
> pristine-tar:
>
> https://github.com/tytso/e2fsprogs/blob/master/util/gen-git-tarball

Wow, written five years ago and basically the same thing that I suggest
(although you store pre-generated ./configure scripts in git).

Going into detail, you use 'gzip -9n' but I use git-archive defaults
which is the same as -n aka --no-name.  I agree adding -9 aka --best is
an improvement.  Gnulib's maint.mk also add --rsyncable, would you agree
that this is also an improvement?  Thus what I'm arriving at is this:

git archive --prefix=inetutils-$(git describe)/ HEAD |
   gzip --no-name --best --rsyncable > -o inetutils-$(git describe)-src.tar.gz

>> To reach our goals in the beginning of this post, this upstream tarball
>> has to be filtered to remove all pre-generated artifacts and vendored
>> code.  Use some mechanism, like the debian/copyright Files-Excluded
>> mechanism to remove them.  If you used a git-archive upstream tarball,
>> chances are higher that you won't have to do a lot of work especially
>> for pre-generated scripts.
>
> Why does it *has* to be filtered?  For the purposes of building, if
> you really want to nuke all of the pre-generated files, you can just
> move them out of the way at the beginning of the debian/rules run, and
> then move them back as part of "debian/rules clean".  Then you can use
> autoreconf -fi to your heart's content in debian/rules (modulo
> possibly breaking things if you insist on nuking aclocal.m4 and
> regenerating it without taking proper care, as discussed above).
>
> This also allows the *.orig.tar.gz to be the same as the upstream
> signed PGP tarball, which you've said is the ideal, no?

Right, there is no requirement for orig.tar.gz to be filtered.  But then
the outcome depends on upstream, and I don't think we can convince all
upstreams about these concerns.  Most upstream prefer to ship
pre-generated and vendored files in their tarballs, and will continue to
do so.  Let's assume upstream doesn't ship minimized tarballs that are
free from vendored or pre-generated files.  That's the case for most
upstream tarballs in Debian today (including e2fsprogs, openssh,
coreutils).  Without filtering that tarball we won't fulfil the goals I
mentioned in the beginning of my post.  The downsides with not filtering
include (somewhat repeating myself):

- Opens up for bugs causing pre-generated files not being re-generated
  even when they are used to build the package.  I think this is fairly
  common in Debian packages.  Making sure all pre-generated files are
  re-generated during build -- or confirming that the file is not used
  at all -- is tedious and fragile work.  Work that has to be done for
  every release.  Are you certain that ./configure is re-generated?  If
  it is not present you would notice.

- Auditing the pre-generated and vendored files for malicious content
  takes more time than not having to audit those files.  Especially if
  those files are not stored in upstream git.

- Pre-generated and vendored files trigger licensing concerns and
  require tedious work that doesn't improve the binary package
  deliverable.  Consider files like texinfo.tex for example, wouldn't it
  be better to strip that out of tarballs and not have to add it to
  debian/copyright?  If some code in a package, let's say getopt.c, is
  not used during build of the package, the license of that file doesn't
  have to be mentioned in debian/copyright if I understand correctly:
  https://www.debian.org/doc/debian-policy/ch-archive.html#s-pkgcopyright
  If in a few releases later, that file starts to get used during
  compilation, the package maintainer will likely not notice.  If it was
  filtered, the maintainer would notice.

The best is when upstream ship a tarball consistent with what I dream
*.orig.tar.gz should be: free of vendored and pre-generated files.
Debian package maintainers can take action before this happens, to reach
nice properties within Debian.  Maybe some upstream will adapt.

>> There is one design of gnulib that is important to understand: gnulib is
>> a source-only library and is not versioned and has no release tarballs.
>> Its release artifact is the git repository containing all the commits.
>> Packages like coreutils, gzip, tar etc pin to one particular commit of
>> gnulib.
>
> Note that how we treat gnulib is a bit differently from how we treat
> other C shared libraries, where we claim that *all* libraries must be
> dynamically linked, and that include source code by reference is
> against Debian Policy, precisely because of the toil needed to update
> all of the binary packages should some security vulnerability gets
> discovered in the library which is either linked statically or
> included by code duplication.
>
> And yet, we seem to have given a pass for gnulib, probably because it
> would be too awkward to enforce that rule *everywhere*, so apparently
> we've turned a blind eye.
>
> I personally think the "everything must be dynamically linked" to be
> not really workable in real life, and should be an aspirational goal
> --- and the fact that we treat gnulib differently is a great proof
> point about how the current debian policy is not really doable in real
> life if it were enforced strictly, everywhere, with no exceptions....
>
> Certainly for languages like Rust, it *can't* be enforced, so again,
> that's another place where that rule is not enforced consistently; if
> it were, we wouldn't be able to ship Rust programs.

Agreed.  I think the policy is mostly a good one, but when there are
special situations like gnulib, Rust, Go etc we need some tools to
handle them.  Debian won't turn gnulib into a shared library.  Debian
won't turn Go into a shared library ecosystem (or maybe Go will actually
go into that direction, but it is slow process..).  I don't know Rust
well but I suppose it is similar.

/Simon

Attachment: signature.asc
Description: PGP signature

Reply via email to