On Tue, May 20, 2025 at 2:07 PM Branko Čibej <br...@apache.org> wrote:

> On 18. 5. 25 21:48, Branko Čibej wrote:
>
> XML has the unenviable distinction of being *both* almost unreadable for
> humans *and* very finicky to parse for machines.
>
>
> There's one other nasty problem with XML: it can't represent every
> character. There's a test for that, xml_unsafe_author2() in prop_tests.py
> and discussion at
>
>   https://issues.apache.org/jira/browse/SVN-4415
>
> but the really painful par is that our comand-line client is quite happy
> to produce invalid XML. Yeah, the *expected output* in that test case is
> invalid XML, heh. I've been thinking about how to solve this; we can't use
> &#*xx*; character entities, we can't use <![CDATA[...]]> sections – both
> are transparent to invalid XML chars. Of course I'm talking about our XML
> output here; we could base64- or quoted-printable-encode values that are
> not valid XML, and we wouldn't be breaking any existing use cases.
>
> Well, that's for command-line output. An XML patch format has similar
> issues. Any patch format does, but XML is especially nasty in that respect.
>
> I created SVN-4919 to track this in the client and to annotate the test.
>
> -- Brane
>


I know we've been discussing an XML-based format for xpatch, including the
pros & cons of being XML-based...

And then I came across this:

[1] https://diffx.org/

This is a page that proposes enhancing the unidiff format in a backwards-
and forwards-compatible way while remaining human readable; it proposes
calling format Extensible Diff or DiffX.

I have done only a cursory skimming through the site and though I have not
done a thorough analysis, I think this is interesting enough to at least
look through and consider.

I'll give it a more careful reading a bit later and will organize my
thoughts about it; for now, I just wanted to point out that this exists.

Thoughts/feedback?

Nathan

Reply via email to