Hi all,

Thursday 04 June 2009 02:53:46 Trent W. Buck wrote:
> Gwern Branwen <[email protected]> writes:
> > There isn't any schema I know of. You really just have to parse it
> > kind of ad-hoc.
>
> And as we've seen in the Darcs repo, input isn't recoded into UTF-8, so
> in *one output document* from changes --xml you can have ISO 8859-1
> bytes, UTF-8 bytes, and JIS bytes.  Which basically means it's not XML :-(

But the contents of files in the repo are not text, they are bytes (also for 
text files, which are managed at lines of bytes delimited by a newline). How 
should we deal with that in XML?

A quick Google search turns up the suggestion to either use base64 or store 
the binary data outside the XML and make the XML refer to it. Both of those 
seem really bad for readability.

Perhaps we can use quoted-printable encoding(*) inside the XML? It sounds 
somewhat Frankensteinian, but we may have code for that lying around already, 
and it encodes the non-ascii bytes while keeping the result readable as text. 
In fact, Google returns results about using quoted-printable in XML, so it's 
not that weird an idea.

Regards,
Reinier
(*): quoted-printable encoding is what is used for e-mail text in encodings 
other than ASCII. It preserves most ASCII characters, but escapes non-ASCII 
bytes.

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to