On Tue, May 20, 2025 at 2:07 PM Branko Čibej <br...@apache.org> wrote:
> On 18. 5. 25 21:48, Branko Čibej wrote: > > XML has the unenviable distinction of being *both* almost unreadable for > humans *and* very finicky to parse for machines. > > > There's one other nasty problem with XML: it can't represent every > character. There's a test for that, xml_unsafe_author2() in prop_tests.py > and discussion at > > https://issues.apache.org/jira/browse/SVN-4415 > > but the really painful par is that our comand-line client is quite happy > to produce invalid XML. Yeah, the *expected output* in that test case is > invalid XML, heh. I've been thinking about how to solve this; we can't use > &#*xx*; character entities, we can't use <![CDATA[...]]> sections – both > are transparent to invalid XML chars. Of course I'm talking about our XML > output here; we could base64- or quoted-printable-encode values that are > not valid XML, and we wouldn't be breaking any existing use cases. > > Well, that's for command-line output. An XML patch format has similar > issues. Any patch format does, but XML is especially nasty in that respect. > > I created SVN-4919 to track this in the client and to annotate the test. > > -- Brane > I know we've been discussing an XML-based format for xpatch, including the pros & cons of being XML-based... And then I came across this: [1] https://diffx.org/ This is a page that proposes enhancing the unidiff format in a backwards- and forwards-compatible way while remaining human readable; it proposes calling format Extensible Diff or DiffX. I have done only a cursory skimming through the site and though I have not done a thorough analysis, I think this is interesting enough to at least look through and consider. I'll give it a more careful reading a bit later and will organize my thoughts about it; for now, I just wanted to point out that this exists. Thoughts/feedback? Nathan