Re: Binary Patch (xpatch)

Branko Čibej Sun, 08 Jun 2025 12:54:03 -0700

On 8. 6. 25 20:55, Nathan Hartman wrote:

On Tue, May 20, 2025 at 2:07 PM Branko Čibej <br...@apache.org> wrote:


    On 18. 5. 25 21:48, Branko Čibej wrote:

    XML has the unenviable distinction of being *both* almost
    unreadable for humans *and* very finicky to parse for machines.


    There's one other nasty problem with XML: it can't represent every
    character. There's a test for that, xml_unsafe_author2() in
    prop_tests.py and discussion at

    https://issues.apache.org/jira/browse/SVN-4415

    but the really painful par is that our comand-line client is quite
    happy to produce invalid XML. Yeah, the /expected output/ in that
    test case is invalid XML, heh. I've been thinking about how to
    solve this; we can't use &#/xx/; character entities, we can't use
    <![CDATA[...]]> sections – both are transparent to invalid XML
    chars. Of course I'm talking about our XML output here; we could
    base64- or quoted-printable-encode values that are not valid XML,
    and we wouldn't be breaking any existing use cases.

    Well, that's for command-line output. An XML patch format has
    similar issues. Any patch format does, but XML is especially nasty
    in that respect.

    I created SVN-4919 to track this in the client and to annotate the
    test.

    -- Brane

I know we've been discussing an XML-based format for xpatch, includingthe pros & cons of being XML-based...


And then I came across this:

[1] https://diffx.org/

This is a page that proposes enhancing the unidiff format in abackwards- and forwards-compatible way while remaining human readable;it proposes calling format Extensible Diff or DiffX.

I have done only a cursory skimming through the site and though I havenot done a thorough analysis, I think this is interesting enough to atleast look through and consider.

I'll give it a more careful reading a bit later and will organize mythoughts about it; for now, I just wanted to point out that this exists.


Thoughts/feedback?

Looks good at first glance but I detect a certain failure of imaginationfrom the authors. Because if the format is extensible, but theextensions aren't standardised and codified, then we're back to where weare now: with 17 different, almost-but-not-quite compatible diffformats. For example, they carry on about character encodings, but spendnot one word on newlines. Or normalization forms. Or any of the other100 ways the "same" character encoding may send you gibbering over a cliff.

Yeah, the .diff extension, when the standard since at least 40 years agois .patch. Guess what? These people don't have a clue. No, really, Imean it.


Mutability. Sooooo ... unidiffs aren't mutable? That's a selling point?

Their example about the "encoding" attribute is wrong. It says:

#..preamble:  encoding=utf-32, length=217


and then goes on to say:

|length|(integer –/required/):

    The length of the section’s content in bytes.


Please show me a valid utf-32 string that's 217 bytes long.

Line endings ... oh, yes, they're mentioned in the spec. Except thatthere's no provision for mixed line endings, which we have to deal withfar too often.

DiffX files have no default encoding.

Oh cool. But your spec assumes the encoding is superset of ASCII. Thespec doesn't support EBCDIC or other different encodings. I guess, thesedays, that's sort of manageable. But they don't even mention anythingthat's not compatible with ASCII, and call it "universal".

I'm rambling. But, basically, this proposal is as much of a mess as anyother. They don't even give a formal syntax that parsers could follow,just a bunch of examples and hand-waving. Yet another wannabe spec thatdoesn't start with a testable theory of changes -- a diff algebra if youlike, with all the various mutations and edge cases -- and divesstraight into "let's take unidiff and tweak it a bit". I guess the otherway is a lot of work and sounds too much like maths. They don't evenconsider how to represent something that can be 3-way merged, let alone4-way. Tree mutations? What are those? Etc. ad nauseam.


TL;DR: It's well-meaning crap, which is the worst kind.

-- Brane

Re: Binary Patch (xpatch)

Reply via email to