On 07/01/26 at 11:11 +0000, Ian Jackson wrote:
> Lucas Nussbaum writes ("Re: Include git commit id and git tree id in
> *.changes files when uploading? [and 1 more messages]"):
> > But it has something to do with upstream git commits. If
> > - upstream tarballs are generated to include the git commit used (as
> > with git-archive)
> > - and the tarball is not rewritten by uscan
> > - and pristine-tar is used
> > Then the git commit used by upstream to generate the tarball is
> > preserved in Debian's upstream (orig) tarball.
> ...
> > (as a tar pax header).
>
> Interesting. TIL that this is even possible!
>
> I think tag2upload-(re)generated origs (even without pristine-tar
> support) have the same property. They are generated with git-archive
> and the manpage suggests it includes this information unconditionally.
>
> I picked a recent tag2upload -1 upload, emacs-llama 1.0.3-1. The
> build log (sent to the debian-tag2upload list [0]) contains this:
>
> # no orig(s) in archive, generating
> + git deborig 2a89ba755b0459914a44b1ffa793e57f759a5b85
> # created orig
>
> It generated this tarball:
>
> db2efcb550a36160efc2799bc774478499ae685e40ecd709b434d65a7df894ed
> emacs-llama_1.0.3.orig.tar.xz
>
> And I see this:
>
> xzcat emacs-llama_1.0.3.orig.tar.xz | git-get-tar-commit-id
> 2a89ba755b0459914a44b1ffa793e57f759a5b85
>
> I looked in debaudit (gosh, impressivwe site btw) and it does show a
> previous version of this package, That was also uploaded using
> tag2upload, and also involved a tag2upload-generated orig. Your
> system says:
>
>
> https://debaudit.debian.net/git2dsc/result/9bcde2733e81c15c76c1acc09549d4358c21cc9b49d876149cf2bfdb37c27b72
> git2dsc report for emacs-llama 1.0.2-1
> 910 - git-generated dsc identical to archive dsc after normalization
>
> which I think is good?
debaudit includes two tools:
- orig-check tries to reproduce the upstream tarball
- git2dsc tries to reproduce the debian tarball and the dsc
Here we are interested in the orig tarball, so you want the orig-check
result:
https://debaudit.debian.net/orig-check/result/5777f2b988e416a9441ab179a8b9f74b015e63f482c7780f742a6adcb4019fdd
(or the previous version of the package,
https://debaudit.debian.net/orig-check/result/9bcde2733e81c15c76c1acc09549d4358c21cc9b49d876149cf2bfdb37c27b72
)
And yes, the orig tarball embeds git commit
2a89ba755b0459914a44b1ffa793e57f759a5b85,
which is the same as the upstream tarball.
Of course the tarball themselves differ (because of the leading path,
see the diffoscope output in the compare section), but are identical
after normalization (what you call "treesame").
So that's good, except for the non-bit-identical tarballs (which could
bit bit-identical if pristine-tar was used).
> > That's not a corner case. According to debaudit/orig-check results,
> > 57% of our packages in sid (that's 22016 packages) have an orig tarball
> > that is bit-identical to the upstream tarball downloaded by uscan.
> > Out of those 22016 orig tarball, 7769 (35%) include a git commit
>
> So I think the existing tag2upload system makes this reliable. All
> tag2upload-generated origs should have this metadata, and furthemore
> the upstream commit mentioned will always be available and findable at
> *.dgit.debian.org, even if the Salsa repo has moved or been deleted or
> moved.
>
> If we implement support pristine-tar, and users start to use it, this
> property may no longer hold: a "pristine" orig tarball from upstream
> might be lacking this particular metadata. So arguably pristine-tar
> support is a regression!
>
> The root cause of course is that "pristine" upstream tarballs are far
> from pristine. The name of the pristine-tar program is a deliberate
> joke, on the part of its author, even! What is really pristine is a
> tarball generated from git-archive, which is what you are using for
> this tracing strategy, and which is what tag2upload (without
> pristine-tar) provides.
Well at this point, my conclusion is that if one uses the traditional
method with the best practices (such as pristine-tar), then things
are fine, and if one uses the tag2upload method with the best
pratices (such as using the upstream git tree, not gbp import-orig) then
things are also fine.
But many things can go in a suboptimal way in both cases, and it's
difficult to say which one is better in terms of traceability.
> > For example, interestingly, there are 815 packages where the orig tarball
> > commit
> > does not match a freshly downloaded upstream tarball. A few examples:
> > https://debaudit.debian.net/orig-check/result/00ea060645a90efd84709fa609b02a40081c9dcb0274619cc8246e38f87af1e2
> > https://debaudit.debian.net/orig-check/result/015c69f5273e494330073760c1c3b27385d1057c35ceb25dca3a7e90c3d1c8ac
> > https://debaudit.debian.net/orig-check/result/01f5dba7b0712cad020f624c5ca28151746845bae88cf7af8a51ed2aa612e08a
> > https://debaudit.debian.net/orig-check/result/020f4cd9d4a34aae99df22649ec792d1d53faf1a7bc4c7366d285ec3176b798c
> > https://debaudit.debian.net/orig-check/result/02227b8efcf6e905f919f65cb0eb85ee975b925cd305a7db33ed1c8ea6c3bf33
>
> Interesting. Are you able to easily search for such situations where
> the upload was done with tag2upload?
There are 801 source packages uploaded using tag2upload in sid,
distributed as follows:
native | new_upstream | orig_commit_not_null | uscan_commit_not_null |
commits_eq | count
--------+--------------+----------------------+-----------------------+------------+-------
f | f | f | f |
| 173
f | f | f | t |
| 35
f | f | t | f |
| 70
f | f | t | t | f
| 55
f | f | t | t | t
| 150
f | t | f | f |
| 1
f | t | t | f |
| 103
f | t | t | t | f
| 143
f | t | t | t | t
| 48
t | f | f | f |
| 23
It only makes sense to look at non-native source packages, and uploads
of new upstream versions (since the orig tarball might have been
uploaded without t2u in a previous upload):
orig_commit_not_null | uscan_commit_not_null | commits_eq | count
----------------------+-----------------------+------------+-------
f | f | | 1
t | f | | 103
t | t | f | 143
t | t | t | 48
The upload for the first line was
https://debaudit.debian.net/orig-check/result/ef937f9147919a7eee78549c2dfc037abcec59bdc6a9cfeb27457db8677bd749
which involved repacking, and probably losing the git commit in the
process.
The second line is also OK: there's no upstream git commit to compare
with.
The third line is the interesting one. Some examples are caused by
repacking, like
https://debaudit.debian.net/orig-check/result/0b67b4adc5a7fa1b898a29b025e1ac7b9decee8edb666033a4112a64665ba404
Detailed list for those 143 packages is available at
https://people.debian.org/~lucas/t2u_wrong_commit.txt
What's also interesting is that, out of the 191 packages that can be
compared with an upstream commit, only 48 are matches.
That's very different from the proportion with non-t2u uploads:
orig_commit_not_null | uscan_commit_not_null | commits_eq | count
----------------------+-----------------------+------------+-------
f | f | | 8161
f | t | | 935
t | f | | 1219
t | t | f | 255
t | t | t | 4380
Lucas