On 4/16/12 11:57 PM, "Martin v. Löwis" wrote:
Maybe a better checksum would be a global hash calculated differently ?
Define a protocol, and I present you with an implementation that
conforms to the protocol, and still has inconsistent data, and not
in a malicious manner, but due to bugs/race conditions/unexpected
events. It's pointless.
if you calculate a checksum with all mirrored files - you can guarantee
that the bits are the same
on both side, no ? then you have the md5 checksum per file for the
download so the client can
be sure he got a non corrupted file.
Ultimately, clients will need to verify the
data that they receive (if they suspect issues), and fall back gracefully.
how can they know if version 1.3 of package foo never made it to the
mirror they use ?
They can't. They have to trust the last modified date and make the
assumption that the mirror
is fresh enough, for foo 1.3 to be present in both the master and the
mirror.
I can definitely see a mirroring implementation where the
last-modified field is updated at the end while some packages are not
copied over at the end for whatever network issue.
That mirroring implementation would violate the principle that
last-modified should only be updated when the mirroring run was
completed successfully.
like a file that claims in its metadata it's 512 kb long and only has 2
kb in reality because something went wrong ?
I think the idea of the checksum is to double-check that kind of claim.
But maybe that's overkill ?
maybe the mirroring code should check file by file that everything was
copied correctly ?
Cheers
Tarek
Regards,
Martin
_______________________________________________
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig