Hi all, Thursday 04 June 2009 02:53:46 Trent W. Buck wrote: > Gwern Branwen <[email protected]> writes: > > There isn't any schema I know of. You really just have to parse it > > kind of ad-hoc. > > And as we've seen in the Darcs repo, input isn't recoded into UTF-8, so > in *one output document* from changes --xml you can have ISO 8859-1 > bytes, UTF-8 bytes, and JIS bytes. Which basically means it's not XML :-(
But the contents of files in the repo are not text, they are bytes (also for text files, which are managed at lines of bytes delimited by a newline). How should we deal with that in XML? A quick Google search turns up the suggestion to either use base64 or store the binary data outside the XML and make the XML refer to it. Both of those seem really bad for readability. Perhaps we can use quoted-printable encoding(*) inside the XML? It sounds somewhat Frankensteinian, but we may have code for that lying around already, and it encodes the non-ascii bytes while keeping the result readable as text. In fact, Google returns results about using quoted-printable in XML, so it's not that weird an idea. Regards, Reinier (*): quoted-printable encoding is what is used for e-mail text in encodings other than ASCII. It preserves most ASCII characters, but escapes non-ASCII bytes.
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
