David Kastrup <[email protected]>: > If you tell Emacs that some external entity is in UTF-8, it will > represent all valid UTF-8 sequences as properly decoded characters, > and it has special codes for all bytes not part of valid UTF-8. > > As a result, it works with valid UTF-8 perfectly as expected but will > reproduce arbitrary byte streams thrown at it perfectly when decoding > as UTF-8 and then reencoding into UTF-8 again. > > Guile is lacking this byte stream reproducibility when > decoding/reencoding. That makes it a whole lot less robust for dealing > with externally provided material.
Python3 supports this by abusing the surrogate code points. I don't recommend following Python's lead. Instead, when decoding a byte string into Unicode, the application should be returned a list: ( chars bytes chars bytes ... chars ) or some similar mechanism. Marko
