David Kastrup <[email protected]>:

> If you tell Emacs that some external entity is in UTF-8, it will
> represent all valid UTF-8 sequences as properly decoded characters,
> and it has special codes for all bytes not part of valid UTF-8.
>
> As a result, it works with valid UTF-8 perfectly as expected but will
> reproduce arbitrary byte streams thrown at it perfectly when decoding
> as UTF-8 and then reencoding into UTF-8 again.
>
> Guile is lacking this byte stream reproducibility when
> decoding/reencoding. That makes it a whole lot less robust for dealing
> with externally provided material.

Python3 supports this by abusing the surrogate code points. I don't
recommend following Python's lead.

Instead, when decoding a byte string into Unicode, the application
should be returned a list:

   ( chars bytes chars bytes ... chars )

or some similar mechanism.


Marko

Reply via email to