Wrong UTF-8 encoders still around?

Martin J. Dürst Thu, 20 Oct 2011 16:17:40 -0700

I'm hoping to get some advice from people with experience with variousUnicode/transcoding libraries.


RFC 3987 (the current IRI spec) has the following text:


   Note: Some older software transcoding to UTF-8 may produce illegal
      output for some input, in particular for characters outside the
      BMP (Basic Multilingual Plane).  As an example, for the IRI with
      non-BMP characters (in XML Notation):
      "http://example.com/&#x10300;&#x10301;&#x10302";;
      which contains the first three letters of the Old Italic alphabet,
      the correct conversion to a URI is
      "http://example.com/%F0%90%8C%80%F0%90%8C%81%F0%90%8C%82";

We are thinking about removing this because we hope that software hasimproved in the meantime, but we would like to be sure about this.

If anybody knows about software out there that still presents thisproblems, please tell us.


Thanks,    Martin.

Wrong UTF-8 encoders still around?

Reply via email to