Re: "A Programmer's Introduction to Unicode"

2017-03-15 Thread Steffen Nurpmeso
"Doug Ewell" wrote: |Philippe Verdy wrote: |>>> Well, you do have eleven bits for flags per codepoint, for example. |>> |>> That's not UCS-4; that's a custom encoding. |>> |>> (any UCS-4 code unit) & 0xFFE0 == 0 | |(changing to "UTF-32" per Ken's observation) | |>

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Richard Wordingham
On Tue, 14 Mar 2017 08:51:18 + Alastair Houghton wrote: > On 14 Mar 2017, at 02:03, Richard Wordingham > wrote: > > > > On Mon, 13 Mar 2017 19:18:00 + > > Alastair Houghton wrote: > > The

RE: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Doug Ewell
Philippe Verdy wrote: >>> Well, you do have eleven bits for flags per codepoint, for example. >> >> That's not UCS-4; that's a custom encoding. >> >> (any UCS-4 code unit) & 0xFFE0 == 0 (changing to "UTF-32" per Ken's observation) > Per definition yes, but UTC-4 is not Unicode. I guess

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Philippe Verdy
Per definition yes, but UTC-4 is not Unicode. As well (any UCS-4 code unit) & 0xFFE0 == 0 (i.e. 21 bits) is not Unicode, UTF-32 is Unicode (more restrictive than just 21 bits which would allow 32 planes instead of just the 17 first ones). I suppose he meant 21 bits, not 11 bits which covers

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Doug Ewell
Steffen Nurpmeso wrote: >> I didn’t say you never needed to work with code points. What I said >> is that there’s no advantage to UCS-4 as an encoding, and that > > Well, you do have eleven bits for flags per codepoint, for example. That's not UCS-4; that's a custom encoding. (any UCS-4 code

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Steffen Nurpmeso
Alastair Houghton wrote: |On 13 Mar 2017, at 21:10, Khaled Hosny wrote: |> On Mon, Mar 13, 2017 at 07:18:00PM +, Alastair Houghton wrote: |>> On 13 Mar 2017, at 17:55, J Decker wrote: |>>> |>>> I liked the Go

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Alastair Houghton
On 14 Mar 2017, at 02:03, Richard Wordingham wrote: > > On Mon, 13 Mar 2017 19:18:00 + > Alastair Houghton wrote: > >> IMO, returning code points by index is a mistake. It over-emphasises >> the importance of the code point,

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Alastair Houghton
On 13 Mar 2017, at 21:10, Khaled Hosny wrote: > > On Mon, Mar 13, 2017 at 07:18:00PM +, Alastair Houghton wrote: >> On 13 Mar 2017, at 17:55, J Decker wrote: >>> >>> I liked the Go implementation of character type - a rune type - which is a >>>

Re: "A Programmer's Introduction to Unicode"

2017-03-14 Thread Manish Goregaokar
Ah, it was what I thought you were talking about -- I wasn't aware they were considered word boundaries :) Thanks for the links! On Mar 13, 2017 4:54 PM, "Richard Wordingham" < richard.wording...@ntlworld.com> wrote: On Mon, 13 Mar 2017 15:26:00 -0700 Manish Goregaokar

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Richard Wordingham
On Mon, 13 Mar 2017 19:18:00 + Alastair Houghton wrote: > IMO, returning code points by index is a mistake. It over-emphasises > the importance of the code point, which helps to continue the notion > in some developers’ minds that code points are somehow

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Richard Wordingham
On Mon, 13 Mar 2017 20:20:25 -0400 "Mark E. Shoulson" wrote: > Sanskrit external vowel sandhi is comparatively > straightforward (compared to consonant sandhi), and it frequently > loses information. A *or* AA plus I is E; A *or* AA plus U is O (you > need A + O to get AU).

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Mark E. Shoulson
A word ending in A *or* AA preceding a word beginning in A *or* AA will all coalesce to a single AA in Sanskrit. That's four possibilities, and that doesn't count a word ending in a consonant preceding a word beginning in AA, which would be written the same. My memory is rusty, so I should

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Richard Wordingham
On Mon, 13 Mar 2017 15:26:00 -0700 Manish Goregaokar wrote: > Do you have examples of AA being split that way (and further reading)? > I think I'm aware of what you're talking about, but would love to read > more about it. Just googling for the three words 'Sanskrit',

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Manish Goregaokar
Do you have examples of AA being split that way (and further reading)? I think I'm aware of what you're talking about, but would love to read more about it. -Manish On Mon, Mar 13, 2017 at 2:47 PM, Richard Wordingham wrote: > On Mon, 13 Mar 2017 23:10:11 +0200 >

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Richard Wordingham
On Mon, 13 Mar 2017 23:10:11 +0200 Khaled Hosny wrote: > But there are many text operations that require access to Unicode code > points. Take for example text layout, as mapping characters to glyphs > and back has to operate on code points. The idea that you never need >

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Khaled Hosny
On Mon, Mar 13, 2017 at 07:18:00PM +, Alastair Houghton wrote: > On 13 Mar 2017, at 17:55, J Decker wrote: > > > > I liked the Go implementation of character type - a rune type - which is a > > codepoint. and strings that return runes from by index. > >

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Alastair Houghton
On 13 Mar 2017, at 17:55, J Decker wrote: > > I liked the Go implementation of character type - a rune type - which is a > codepoint. and strings that return runes from by index. > https://blog.golang.org/strings IMO, returning code points by index is a mistake. It

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread J Decker
I liked the Go implementation of character type - a rune type - which is a codepoint. and strings that return runes from by index. https://blog.golang.org/strings Doesn't solve the problem for composited codepoints though... texel looks to be defined as a graphic element already. TEXture

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Janusz S. Bien
Quote/Cytat - J Decker (Mon 13 Mar 2017 06:55:18 PM CET): texel looks to be defined as a graphic element already. TEXture ELement. I'm aware of it, but homonymy/polysemy is something we have to live with. I think there is no risk of confusing texture elements with text

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Janusz S. Bien
Quote/Cytat - Asmus Freytag (Mon 13 Mar 2017 06:00:08 PM CET): [...] This (or similar) scenarios indicate the impossibility to come to a single, universal definition of a "textel" -- the main reason why this term is of lower utility than "pixel". I agree that it is

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Asmus Freytag
On 3/13/2017 3:31 AM, Janusz S. Bien wrote: Just yet another reason for introducing the notion of textel? The main difference between "textel" and "pixel" is that the unit of processing /displaying text is not uniform and fixed,

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread William_J_G Overington
Prof. Janusz S. Bień wrote: > Just yet another reason for introducing the notion of textel? I opine that it would be a good idea to introduce several new words, of which textel would be one, with each such new word having a precisely-defined meaning so that in precise discussions of

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Janusz S. Bien
Quote/Cytat - William_J_G Overington (Mon 13 Mar 2017 12:24:13 PM CET): Prof. Janusz S. Bień wrote: Just yet another reason for introducing the notion of textel? I opine that it would be a good idea to introduce several new words, of which textel would be

Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Janusz S. Bien
Quote/Cytat - Richard Wordingham (Sun 12 Mar 2017 09:10:22 PM CET): On Sun, 12 Mar 2017 20:02:28 +0100 "Janusz S. Bien" wrote: If the basic notion has to be referred in a cumbersome way as "extended grapheme cluster" then it is easier

Re: "A Programmer's Introduction to Unicode"

2017-03-12 Thread Richard Wordingham
On Sun, 12 Mar 2017 20:02:28 +0100 "Janusz S. Bien" wrote: > If the basic notion has to be referred in a cumbersome way as > "extended grapheme cluster" then it is easier to talk about "Unicode > characters" despite the fact that they have a rather loose relation > to

Re: "A Programmer's Introduction to Unicode"

2017-03-12 Thread Janusz S. Bien
Quote/Cytat - Manish Goregaokar (Sun 12 Mar 2017 07:43:22 PM CET): This is just another confirmation that the present Unicode terminology is confusing. I find this to be a symptom of our pedagogy around "characters" in programming; most folks get taught that characters

Re: "A Programmer's Introduction to Unicode"

2017-03-12 Thread Manish Goregaokar
> This is just another confirmation that the present Unicode terminology is confusing. I find this to be a symptom of our pedagogy around "characters" in programming; most folks get taught that characters are bytes are code points, especially because many languages try to make this the case. The

Re: "A Programmer's Introduction to Unicode"

2017-03-11 Thread Janusz S. Bień
On Fri, Mar 10 2017 at 19:55 CET, man...@mozilla.com writes: > I recently wrote > http://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/ > , which sort of addresses the whole hangup programmers have with > treating code points as "characters". [...] This is

Re: "A Programmer's Introduction to Unicode"

2017-03-10 Thread Manish Goregaokar
I recently wrote http://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/ , which sort of addresses the whole hangup programmers have with treating code points as "characters". I also wrote

Re: "A Programmer's Introduction to Unicode"

2017-03-10 Thread Khaled Hosny
On Fri, Mar 10, 2017 at 05:00:55PM +, Peter Constable wrote: > FYI: > > http://reedbeta.com/blog/programmers-intro-to-unicode/ > > The visuals may be the most interesting part. E.g., in the usage heat > map, Arabic Presentation Forms-B lights up much more than I would have > expected I