On Thu, Jun 1, 2017 at 10:32 PM, Paul Gilmartin < [email protected]> wrote:
> On Thu, 1 Jun 2017 18:23:09 -0400, Thomas David Rivers wrote: > >> > >I find that very odd... > > > >strcasecmp() is obliged to convert all upper-case letters into lower-case > >for the comparison. > > > Wouldn't it be a fiasco if it effectively waffled on ligatures? > Hum, I can't see where ligatures are of any concern. Assuming that I understand them, they are just a result of very tight kerning of two separate letters. E.g. "tucking" the "i" under the "roof" of the letter "F". In memory this is still "Fi" - two separate runes (in Go speak - they distinguish "character" versus "rune" or "UNICODE code point". ref: https://blog.golang.org/strings) [quote] ... Code points, characters, and runes We've been very careful so far in how we use the words "byte" and "character". That's partly because strings hold bytes, and partly because the idea of "character" is a little hard to define. The Unicode standard uses the term "code point" to refer to the item represented by a single value. The code point U+2318, with hexadecimal value 2318, represents the symbol ⌘. (For lots more information about that code point, see its Unicode page.) To pick a more prosaic example, the Unicode code point U+0061 is the lower case Latin letter 'A': a. But what about the lower case grave-accented letter 'A', à? That's a character, and it's also a code point (U+00E0), but it has other representations. For example we can use the "combining" grave accent code point, U+0300, and attach it to the lower case letter a, U+0061, to create the same character à. In general, a character may be represented by a number of different sequences of code points, and therefore different sequences of UTF-8 bytes. The concept of character in computing is therefore ambiguous, or at least confusing, so we use it with care. To make things dependable, there are normalization techniques that guarantee that a given character is always represented by the same code points, but that subject takes us too far off the topic for now. A later blog post will explain how the Go libraries address normalization. "Code point" is a bit of a mouthful, so Go introduces a shorter term for the concept: rune. The term appears in the libraries and source code, and means exactly the same as "code point", with one interesting addition. [quote/] > > -- gil > -- Windows. A funny name for a operating system that doesn't let you see anything. Maranatha! <>< John McKown ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
