On 05/27/2013 11:48 AM, Zdenek Pavlas wrote:
And there package diffs, which are ed-style diffs of the
Packages file I mentioned above.  This approach would work quite well
for primary.xml because it doesn't contain cross-references between
packages using non-natural keys.  It doesn't work for the SQLite
database, either in binary or SQL dump format, because of the reliance
on artificial primary keys (such as package IDs).

I've once tried this. With about 10k packages in fedora-updates, the delta
over 2-3 days was +491 -479. Assuming deletions are cheap, the delta should
ideally be 5%. As expected, binary bsddiff yields much bigger (~29%) delta.

A line-wise diff is much smaller because dependencies and package descriptions mostly stay the same. (This assumes consistent sorting of the primary.xml file.)

Can you point me to the primary.xml -> SQLite translation in yum? I've got a fairly efficient primary.xml parser. It might be interesting to see if it's possible to reduce the latency introduced by the SQLite conversion to close to zero. (Decompression and INSERTs can be interleaved with downloading, and maybe the index creation improvements in SQLite are sufficient these days.)

However, for many users that follow unstable or testing, package diffs
are currently slower than downloading the full Packages file because the
diffs are incremental (i.e., they contain the changes from file version
N to N+1, and you have to apply all of them to get to the current
version) and apt-get can easily write 100 MB or more because the
Packages file is rewritten locally multiple times.

Yes, patch chaining should be avoided.  I'd like to use N => 1 deltas,
that could be applied to many recent snapshots.

The Debian package diffs could be combined efficiently in the client because it's possible to combine diffs for two adjacent versions without actually knowing what the old or new versions look like. But this hasn't been implemented in APT because ABI impact (which is a bit puzzling, but anyway). Instead, the diffs should soon be combined on the archive side.

--
Florian Weimer / Red Hat Product Security Team
--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Reply via email to