Colin Watson writes ("Bug#1130552: Wanted: idempotent dput (but that may be
impossible)"):
> The sftp method itself doesn't involve removing anything, but the system
> as a whole does. [more explanation]
Ah, I see.
> The target of sftp will typically be some kind of queue. The thing
> processing the queue typically removes the upload when it's finished.
> If you have a shared queue directory (as I think most of these things
> do, with the exception of Launchpad), then that means that things can go
> wrong if you have two uploads in the queue at the same time that both
> mention the same .orig.tar file in their .changes. Having dput fail all
> but one of the uploads in that situation seems strictly better to me,
> since it allows outer systems to retry cleanly rather than apparently
> succeeding and then resulting in emailed error messages later.
In principle a protocol that would work reliably would be:
* Consumer doesn't delete .origs when it processes an upload
* Instead, origs with "old enough" dates are deleted
* Use some mode of sftp that overwrites files in place without
truncating so that (i) timestamp is updated due to write() calls
(ii) but file contents is not changed (since we write the same data)
> Having per-upload queue directories is strictly better for this purpose,
> but it's also rather a lot of effort to retrofit, perhaps even
> infeasible.
Right.
> And historically it hasn't been entirely without its
> downsides, since it normally means that people need to resume large
> uploads from scratch. (Of course, t2u offers a quite different solution
> to that problem.)
Well, t2u doesn't actually *solve* this problem. It moves the problem
from the user to the tag2upload service. The t2u service has
hopefully-better connectivity, but when it does fail it is much more
annoying since you have to burn the version number.
One other possible option which would solve the problem just for t2u
would be for the tag2upload service to have a pet instance of the
queue daemon. But really I think we ought to be able to get to a
situation where entities that need to do uploads can do so reliably
despite the fact that the public internet is not reliable.
> With my proposal, would it be sufficient for t2u to just use `dput
> --force`?
For me the question with this is, what is special about tag2upload
that this option is correct ?
I think what we have here is a difference of likelihood, but not a
difference of fundamental principles.
Presumably debusine is troubled by #1133774 because it's normal with
debusine to make many similar uploads, perhaps even simultaneously.
Presumably they have different versions, but they can of course have
the same upstream version so the same origs.
Whereas tag2upload is troubled by the inability to retry the dput
because redoing the whole t2u processing is annoying and
labour-intensive, and the official archive isn't likely to see
concurrent or nearly-concurrent uploads of a package with the same
.orig.
(This orig stuff causes other kinds of races with tag2upload, too -
archive processing delays and the general nontransactional nature of
the archive can mean that the origs that the user intended is hard for
the t2u service to even find. We have largely worked around these.)
If this analysis is right then maybe whether overwriting is a good
idea is a property of the upload target. (If we can't do a *proper*
job without unreasonable effort, which I am quite prepared to
believe.)
Ian.
--
Ian Jackson <[email protected]> These opinions are my own.
Pronouns: they/he. If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.