David Roundy wrote:
    On Sat, Sep 03, 2005 at 08:57:00PM -0700, Bill Trost wrote:
    > Here's an off-the-cuff proposal:
    > 
    >       binary-diff FILENAME DIFF-ALGORITHM
    >       old-start START
    >       old-length LENGTH
    >       new-start START
    >       new-length LENGTH
    >       *HEXHEXHEX
    >       *...
    >       ...repeat with "old-start" for each chunk...

    We've talked about this recently, and I think the consensus was that
    we wouldn't mind for binary patches to be binary....

My only concern is the email encoding. "darcs send" currently just spits
the uncompressed patch to the destination. Would someone have to write a
MIME converter for binary patches?

Anyhow, then, a binary format proposal: Start with the line of text

  binary-xor FILENAME DIFF-ALG OLD-START OLD-LEN NEW-START NEW-LEN MASK-LEN

Then, the next "line" consists of MASK-LEN bytes of XOR mask, and the
next part of the compound diff starts at the byte following the XOR
mask.

    > I added DIFF-ALGORITHM so that "darcs optimize" has an easy way
    > of deciding if there's something to optimize -- it's not actually
    > needed for applying the patch.

    I think I'd lean against including a diff-algorithm flag.  You're
    right that it could be useful but it's also something we'd have to
    live with indefinitely, and I don't like that.  I'd prefer to add a
    flag to optimize to rediff binary patches and let the user decide.

Remember, this is supposed to be a generic binary diff format that
can support a variety of different diff algorithms -- filewide-XOR,
xdelta, a bsdiff-XOR are all plausible examples. "darcs optimize"
needs some way of determining whether there's any point in trying to
optimize the diff. A filewide-XOR could be profitably converted to a
xdelta-generated diff, but not vice-versa. I don't think we want to
force "darcs optimize" to generate a new diff for each patch when the
patch has already been optimized.

    I think I'd also lean towards sticking the starts and lengths all on
    the same line with the "binary-diff" and not labelling them.  Or we
    could label them with just a + and -.

I don't understand what you're proposing. Could you provide an example?

    And don't old-start and new-start have to be identical?

    binary-diff FILENAME START -OLDLENGTH +NEWLENGTH

Imagine a tar file in which a new component has been added at the
front. Excluding tar meta-data, the optimal patch would consist of one
hunk that adds the new component to the front, and another hunk that
copies the old files (which started at offset zero, roughly) to the
offset roughly equal to the length of the old file.

Then again, I may simply be missing something here. I don't understand
how the text format works -- how do you invert a patch that only
contains one line number?

Bill

_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel

Reply via email to