Re: The Case Against Autodecode

tsbockman via Digitalmars-d Thu, 02 Jun 2016 14:06:29 -0700

On Thursday, 2 June 2016 at 20:49:52 UTC, Andrei Alexandrescuwrote:

On 06/02/2016 04:47 PM, tsbockman wrote:
That doesn't sound like much of an endorsement for defaultingto onlylevel 1 support to me - "it does not handle more complexlanguages or
extensions to the Unicode Standard very well".
Code point/Level 1 support sounds like a sweet spot betweenefficiency/complexity and conviviality. Level 2 is opt-in withbyGrapheme. -- Andrei

Actually, according to the document Walter Bright linked level 1does NOT operate at the code point level:

Level 1: Basic Unicode Support. At this level, the regularexpression engine provides support for Unicode characters asbasic 16-bit logical units. (This is independent of the actualserialization of Unicode as UTF-8, UTF-16BE, UTF-16LE, orUTF-32.)
...
Level 1 support works well in many circumstances. However, itdoes not handle more complex languages or extensions to theUnicode Standard very well. Particularly important cases are**surrogates** ...

So, level 1 appears to be UTF-16 code units, not code points. Todo code points it would have to recognize surrogates, which arespecifically mentioned as not supported.

Level 2 skips straight to graphemes, and there is no code pointlevel.

However, this document is very old - from Unicode 3.0 and theyear 2000:

While there are no surrogate characters in Unicode 3.0 (outsideof private use characters), future versions of Unicode willcontain them...


Perhaps level 1 has since been redefined?

Re: The Case Against Autodecode

Reply via email to