On Wed, Nov 02, 2011 at 01:29:16PM +0000, Max Bolingbroke wrote:
> On 2 November 2011 10:03, Jean-Marie Gaillourdet <j...@gaillourdet.net> wrote:
> > As far as I know, not all encodings are reversable. I.e. there are byte 
> > sequences which are invalid utf-8. Therefore, decoding and re-encoding 
> > might not return the exact same byte sequence.
> 
> The PEP 383 mechanism explicitly recognises this fact and defines a
> reversible way of decoding bytes into strings. The new behaviour is
> guaranteed to be reversible except for certain private use codepoints
> (0xEF00 to 0xEFFF inclusive) which:
>  1. We do not expect to see in practice
>  2. Are unofficially standardised for use with this sort of "encoding hack"

I don't understand this.

If I understand correctly, you use U+EF00-U+EFFF to encode the
characters 0-255 when they are not a valid part of the UTF8 stream.

So why not encode U+EF00 (which in UTF8 is 0xEE 0xBC 0x80) as
U+EFEE U+EFBC U+EF80, and so on? Doesn't it then become completely
reversible?


Thanks
Ian


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to