Thank you for your answer!
I'm coming now from the reading of http://www.gentoo.org/proj/en/glep/glep-0025.html ;)


Brian Harring ha scritto:

On Thu, Feb 03, 2005 at 11:22:22PM +0100, Francesco Riosa wrote:


Thank for pointing it out, never read about it!
It was something similar to that, I was thinking about do it patching source not binaries, thinking that patching binaries is difficult (both for cpu and to reach good compression level).



It's preferable to do binpatching vs line based diffing for most sources. Smaller delta/patch, and can result in a perfect recreation of the tarball- one time patching, instead of applying N patches to jump to a version.
While they're compressable, the results of diffball/xdelta/bdelta (aborted deltup differencer) blow away the results of diff patches for the most part.


The main problem with diff based patches is that they're line based- change a single char in the new version, and you have to encode both the old version of the line, and the new version. That and diff storing context of the patch (+/- 3 lines typically), which is extra fluff (useful for fuzzyness, but not version upgrades), etc.


we are working on fixed sources and destination, so fuzzines and contest can be keeped away. A tipical "diff -dNr" look like this:
[patch]
4c4
< echo "old";
---
> echo "new";
[/patch]


It seems that's not true, seeing his results,


E'yep :)


;)



also reading that remembered me that there is a md5sum problem (that can be resolved) in all this stuff.



The route deltup takes to resolve it is basically a hack; it's reactive- the base problem (described in http://glep.gentoo.org/glep-0025.html) is that updates to a compressor, can result in a slightly different/smaller file.


Eg, with bzip2 v0.9, you get a slightly larger compressed file then with bzip v1.0; differencing has to decompress, patch, then recompress the tarball- deltup relies on you to have an older version of bzip2 installed if you ever run into a file that was originally compressed with v0.9. So... it has an exception added to sidestep that particular case. Problem is, you need to keep adding exceptions in for new cases, reactively.

So... it's not a huge issue, compressors don't change all that much (the current breed used- gzip/bzip2). A new compressor (ppmd/rzip fex) would resort in issues.

An addendum to it thinking about it, is that deltup has to determine how the file was originally compressed, and then use the same settings- this is stored in the fdtu iirc. Again, not really a perfect solution (it works, but its ugly), nor particularly scalable for generating a massive set of diffs.

So.. yeah. The md5 issue isn't really resolved, it's sidestepped with a set of special case additions.
That ^^^ is the main beef I have with deltup, it works for the most part, but the methods it takes to overcome md5 related issues are fragile/slow/strike me as hacks. :)


The way I was thinking to resolve it was to add the md5 sums of *each* patch, so you check that not the final file so the "2list" file that stay near every package (8000/9000 currently) should have a 4 fields instead of 3
the new record look like this:
- kind of the record/file (0comp|1diff)
- size of the file [kB] this one could be *not* the true size but a coefficient of download preferibility
- md5 sum
- name
This should not be a security issue (not more than now) because portage already assume that mirrors can be trusted




Also it seem that is an old aged idea, that has never take place, can someone please explain me why ?



I personally ran out of steam trying to resolve the issue of required mirror space for it when I took a thwack at it-
http://www.gentoo.org/proj/en/glep/glep-0025.html#distfile-mirror-additions


using the solution I've posted there is not initial disk space required, not more that 9000 new files of 2k each, for a transition period you can have only 0comp records in it.
This translate in portage not downloading patches but only complete files. In every case I was thinking on having at least two full version of the packages, one stable and one ~arch, and for smaller files, VIP files (portage, system components and so on) to have only the *full* package, not patches.


When all user has upgraded to a version of portage that can handle the diffs you can start switch. (2006.0)(maybe hashish dreams)

If it's not a strictly opt-in feature (eg, some versions are offered on gentoo mirrors only via patches), users who don't care for patching/have a fast connection will get mad- reconstructing a file isn't the fastest thing known to man. Fex, say you're jumping from v 2.6, to v.10 of the linux kernel- w/ deltup/xdelta, you have to generate 3 intermediate files, since it can only apply one patch at a time. So... there is a space issue also, aside from the extra io.

Diffball can apply multiple patches in a single run (no intermediate files) for fdtu/xdelta patches, so that's slightly sidestepped.

There also is the question of how to generate the diffs... a dedicated box for it would be preferable, rather then foisting it off on devs.


Yep they are developer, if there is a repeatable/boring task they will for sure want/write a program that can do it in their place :)

Offhand, deltup kind of rose up again via the dynamic deltup project.  Not sure 
of the status of it-

http://forums.gentoo.org/viewtopic.php?t=215262&highlight=dynamic+deltup

last release as far as I can tell was in oct. The irc channel also is deserted, and the forum thread kind of shows signs of it being stopped/discontinued.


yes, http://sourceforge.net/projects/deltup/ shows October 14, 2004 as last release, the bugs page http://sourceforge.net/tracker/?group_id=77305&atid=549815 also shows three blocking bugs nobody assigned.

Note I'm not affiliated/knowledge about that project- so, they might still be kicking (in which case, if you're reading this ml kindly rear your head and correct me if I'm wrong)...

~brian


From the number of post on that thread (and others) seems to me that thre is *not* lack of interest from the user side.

A lot of good quality work seems to be already done, hope that it will be usefull to the community

Francesco

--
[email protected] mailing list



Reply via email to