On Sat, Sep 03, 2005 at 03:51:33AM +0100, Ian Lynagh wrote: > On Fri, Sep 02, 2005 at 03:07:51PM -0700, Bill Trost wrote: > > So, I propose that a binary patch consist of a sequence of the 5-tuple > > > > (old-start, old-length, new-start, new-length, bitmask) > > > > Applying a binary patch consists of doing the following with each > > 5-tuple: > > > > 1. Truncating or zero-extending the bitmask and source region so > > they are as long as the target region. > > > > 2. XORing the bitmask and source region to generate the target > > region. > > Sounds good to me. I think commuting with current binary patches is > possible, albeit slightly nasty.
Sounds good to me too. But I don't think binary patches should ever commute... to me that's a lot of what defines a binary patch. You don't want to ever merge two changes to a binary file and get a merged result--the assumption is that anything darcs does other than record versions may "break" the file. This makes binary patches simpler, since they just need to be invertible. > We could also do > (old-start, old-length, new-start, new-length, old-content, new-content) > of course, but once you stop storing the complete file in each patch I > think having the actual content in the patch has very little advantage > over the bitmask idea. This format might be nice for non-binary files, though. What you describe is basically a character-based hunk patch type, which would be a nice feature for certain sorts of text files. Interestingly, the diff code for generating character-based hunk patches could be the same as the code for generating the binary patches described above, except that when generating binary patches you might have different priorities (fewer larger hunks might be better, and we might want to scale better to larger files, since we benefit only in space from breaking a binary patch into smaller hunks). But there could certainly be a lot of code reuse between the two. And we also ought to be able to leverage our existing diff. > Does this all fly? Presuming I'm not completely out in left field, it > seems like integrating the format into darcs promptly would be useful, > just to ease the backward-compatibility burden. The bleeding-edge > version of darcs could start generating full-body XORs soon thereafter, > and people could start experimenting with other diff algorithms at their > convenience. It might even be useful to add a "--binary-diffs" flag to > "darcs optimize" someday, especially for those maintaining public darcs > repositories. Indeed that sounds like a good plan of attack. Hopefully our new "RepoFormat" code will also ease the forward and backward compatibility issues (and I'm definitely looking forward to our first opportunity to make use of that framework). Bill, were you interested in implementing some of this support? i.e. are you either a haskeller or one who'd be interested/willing to learn haskell? It's a great language... -- David Roundy http://www.darcs.net _______________________________________________ darcs-devel mailing list [email protected] http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel
