>>>>> "Neil" == Neil Williams <[email protected]> writes:

Thought I'd just point out where my paranoia about proper source version
matching comes from:

 * we're using architecture powerpcspe from debian-ports.  debian-ports
   doesn't carry sources, and is out of sync with normal debian mirrors,
   which makes it pretty difficult to satisfy the GPL.

 * we're using our own, custom mirroring tool [1] to overcome the
   limitations of debian-ports.  this tool is named 'debparanoia' for a
   reason, as it double checks that matching source packages are present
   for all .debs.

 * Since debparanoia is used for mirroring as well as for
   "license-checking" our images, I dont't really care if multistrap
   does the source package version check.  I just thought it would be
   good anyways to eliminate the slightest chance of 'apt-get source'
   not satisfying the GPL by getting wrong package versions, without the
   user noticing.

Below I'll try again to prove my point, sorry if this is getting off
topic and wasting your time :)

> On Wed, 15 Jun 2011 23:22:25 +0200
> David Kuehling <[email protected]> wrote:
>> Are you assuming a non-changing archive?

> No, I'm assuming a decent mirroring tool. There is also a need to
> understand exactly what apt is doing with apt-get source. I don't
> think you've got that clear.

I'm sorry if I gave the impression of not understanding apt nor archives
:) Maybe last mail just written to hastily.  I think I understand that
stuff pretty well.

>> As soon as the archive changes non-atomically (with locking applied
>> by the client) I think we're doomed.

> Broken mirror resulting from an inadequate mirroring tool. Not
> something which either apt or multistrap can fix or avoid.

I'm willing to believe that debian mirror updates are atomic.

[..]

>> * Concurrently I run 'multistrap' which runs 'apt-get update',
>> fetching dists/sid/main/binary-amd64/Packages.bz2 and
>> sid/main/source/Sources.bz2 .
>> 
>> * Now I get Packages.bz2 from before the mirror update, and
>> Sources.bz2 from after the mirror update.

> No you don't. You get Packages.bz2 and Sources.bz2 in sync at the same
> time in the same apt-get update call. Indeed in most cases, as apt is
> using parallel connections, Packages and Sources are downloaded at the
> same time over multiple sockets. What happens afterwards is that
> apt-get source uses that cached data to get the sources.

Well this is the part that only works if you cross your fingers.
Nothing guarantees that 'apt-get update' schedules the packages.bz2 and
sources.bz2 for synchronous download.  In fact typing in 'apt-get
update' on my pc, it first downloads 4 Sources files, then 4 Packages
files for me.

This race can be detected by checking Packages.bz2 and Sources.bz2 with
the checksums present in the Release.  Not sure whether that's
implemented.  For me 'multistrap' happily uses repositories without
checksums in the Release file.

> If a new version has arrived in the meantime then the old source (the
> same version as the version of the binary downloaded earlier) will
> still be obtained because apt only has cached data for the downloaded
> source version, the Sources file already downloaded before the new
> version arrived. In most cases, the old version will remain for at
> least 10 days because it's the version currently in testing. 
[..]

I do not contest that debian archives carry source packages for all
binary packages in the pool.  I only contest that the .deb packages
referenced by apt's cache do not neccessarily match the source packages
referenced the cache.  This condition would result in license violation
for people who rely on 'apt-get source' to satisfy the GPL.

A version mismatch will occur exacty when a mirror update ocurred in
between the download of sources.bz2 and packages.bz2.  You want to tell
me that this is not possible, however your description of the process
makes it look like it is possible, though unlikely.

The only way to prevent such a race would be to either (a) prevent
mirror updates (i.e. 'lock' the archive) during 'apt-get update'
sessions, or (b) to guarantee that Sources.bz2 and Packages.bz2 download
starts at exactly the same point in time.

I think neither (a) nor (b) can be implemented.  You can only implement
(c): ensure consistency with checksums in the release file, and retry
download ad infinitum, until checksums match.

>> Sources.bz2 is going to have some packages updated to newer versions,
>> and won't correspond 100% to binaries from Packages.bz2, thus
>> violating the GPL

> Not true - unless you're talking about waiting for packages to arrive
> from the buildd's, but that is a very small space of time normally and
> if that bothers you, just don't use unstable for the kind of tasks
> where you need synchronised sources. Unstable does NOT make that
> promise and there is nothing apt or multistrap can do about it.

Note that I wasn't talking about packages in the pool, but only about
the indices that were "snapshotted" by 'apt-get update'.

>> Of course I guess the error rate will not be too high, at least not
>> over a normal high-rate internet connection.

> It's nothing to do with the download speeds.

Well, try 'apt-get update' over a modem line and notice how much time
passes between fetching sources.bz2 and fetching the packages.bz2.
Pretty long time for a mirror update occuring in between?!

cheers,

David

[1] http://sourceforge.net/projects/debparanoia/
-- 
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

Attachment: pgp52pmBUJvLz.pgp
Description: PGP signature

Reply via email to