Re: "A Programmer's Introduction to Unicode"

Alastair Houghton Tue, 14 Mar 2017 01:50:30 -0700

On 13 Mar 2017, at 21:10, Khaled Hosny <khaledho...@eglug.org> wrote:
> 
> On Mon, Mar 13, 2017 at 07:18:00PM +0000, Alastair Houghton wrote:
>> On 13 Mar 2017, at 17:55, J Decker <d3c...@gmail.com> wrote:
>>> 
>>> I liked the Go implementation of character type - a rune type - which is a 
>>> codepoint.  and strings that return runes from by index.
>>> https://blog.golang.org/strings
>> 
>> IMO, returning code points by index is a mistake.  It over-emphasises
>> the importance of the code point, which helps to continue the notion
>> in some developers’ minds that code points are somehow “characters”.
>> It also leads to people unnecessarily using UCS-4 as an internal
>> representation, which seems to have very few advantages in practice
>> over UTF-16.
> 
> But there are many text operations that require access to Unicode code
> points. Take for example text layout, as mapping characters to glyphs
> and back has to operate on code points. The idea that you never need to
> work with code points is too simplistic.


I didn’t say you never needed to work with code points.  What I said is that 
there’s no advantage to UCS-4 as an encoding, and that there’s no advantage to 
being able to index a string by code point.  As it happens, I’ve written the 
kind of code you cite as an example, including glyph mapping and OpenType 
processing, and the fact is that it’s no harder to do it with a UTF-16 string 
than it is with a UCS-4 string.  Yes, certainly, surrogate pairs need to be 
decoded to map to glyphs; but that’s a *trivial* matter, particularly as the 
code point to glyph mapping is not 1:1 or even 1:N - it’s N:M, so you already 
need to cope with being able to map multiple code units in the string to 
multiple glyphs in the result.

Kind regards,

Alastair.

--
http://alastairs-place.net

Re: "A Programmer's Introduction to Unicode"

Reply via email to