Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Richard Wordingham
On Tue, 14 Mar 2017 08:51:18 + Alastair Houghton wrote: > On 14 Mar 2017, at 02:03, Richard Wordingham > wrote: > > > > On Mon, 13 Mar 2017 19:18:00 + > > Alastair Houghton wrote: > > The

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Steffen Nurpmeso
Alastair Houghton wrote: |On 13 Mar 2017, at 21:10, Khaled Hosny wrote: |> On Mon, Mar 13, 2017 at 07:18:00PM +, Alastair Houghton wrote: |>> On 13 Mar 2017, at 17:55, J Decker wrote: |>>> |>>> I liked the Go

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Manish Goregaokar
Ah, it was what I thought you were talking about -- I wasn't aware they were considered word boundaries :) Thanks for the links! On Mar 13, 2017 4:54 PM, "Richard Wordingham" < richard.wording...@ntlworld.com> wrote: On Mon, 13 Mar 2017 15:26:00 -0700 Manish Goregaokar

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Alastair Houghton
On 13 Mar 2017, at 21:10, Khaled Hosny wrote: > > On Mon, Mar 13, 2017 at 07:18:00PM +, Alastair Houghton wrote: >> On 13 Mar 2017, at 17:55, J Decker wrote: >>> >>> I liked the Go implementation of character type - a rune type - which is a >>>

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Alastair Houghton
On 14 Mar 2017, at 02:03, Richard Wordingham wrote: > > On Mon, 13 Mar 2017 19:18:00 + > Alastair Houghton wrote: > >> IMO, returning code points by index is a mistake. It over-emphasises >> the importance of the code point,

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Philippe Verdy
Per definition yes, but UTC-4 is not Unicode. As well (any UCS-4 code unit) & 0xFFE0 == 0 (i.e. 21 bits) is not Unicode, UTF-32 is Unicode (more restrictive than just 21 bits which would allow 32 planes instead of just the 17 first ones). I suppose he meant 21 bits, not 11 bits which covers

RE: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Doug Ewell
Philippe Verdy wrote: >>> Well, you do have eleven bits for flags per codepoint, for example. >> >> That's not UCS-4; that's a custom encoding. >> >> (any UCS-4 code unit) & 0xFFE0 == 0 (changing to "UTF-32" per Ken's observation) > Per definition yes, but UTC-4 is not Unicode. I guess

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Doug Ewell
Steffen Nurpmeso wrote: >> I didn’t say you never needed to work with code points. What I said >> is that there’s no advantage to UCS-4 as an encoding, and that > > Well, you do have eleven bits for flags per codepoint, for example. That's not UCS-4; that's a custom encoding. (any UCS-4 code