Re: strcasecmp() comparing punctuation in ASCII?

Allan Staller Fri, 02 Jun 2017 06:42:08 -0700

John,

I appreciate your attention to detail, but you have waaaaayyyyy too much time 
on your hands! <G>

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of John McKown
Sent: Friday, June 2, 2017 7:43 AM
To: [email protected]
Subject: Re: strcasecmp() comparing punctuation in ASCII?

On Thu, Jun 1, 2017 at 10:32 PM, Paul Gilmartin < 
[email protected]> wrote:

> On Thu, 1 Jun 2017 18:23:09 -0400, Thomas David Rivers wrote:
> >>
> >I find that very odd...
> >
> >strcasecmp() is obliged to convert all upper-case letters into 
> >lower-case for the comparison.
> >
> Wouldn't it be a ﬁasco if it eﬀectively waﬄed on ligatures?
>

Hum, I can't see where ligatures are of any concern. Assuming that I understand 
them, they are just a result of very tight kerning of two separate letters. 
E.g. "tucking" the "i" under the "roof" of the letter "F". In memory this is 
still "Fi" - two separate runes (in Go speak - they distinguish "character" 
versus "rune" or "UNICODE code point". ref:
https://blog.golang.org/strings)
[quote]
...

Code points, characters, and runes

We've been very careful so far in how we use the words "byte" and "character". 
That's partly because strings hold bytes, and partly because the idea of 
"character" is a little hard to define. The Unicode standard uses the term 
"code point" to refer to the item represented by a single value. The code point 
U+2318, with hexadecimal value 2318, represents the symbol ⌘. (For lots more 
information about that code point, see its Unicode
page.)

To pick a more prosaic example, the Unicode code point U+0061 is the lower case 
Latin letter 'A': a.

But what about the lower case grave-accented letter 'A', à? That's a character, 
and it's also a code point (U+00E0), but it has other representations. For 
example we can use the "combining" grave accent code point, U+0300, and attach 
it to the lower case letter a, U+0061, to create the same character à. In 
general, a character may be represented by a number of different sequences of 
code points, and therefore different sequences of UTF-8 bytes.

The concept of character in computing is therefore ambiguous, or at least 
confusing, so we use it with care. To make things dependable, there are 
normalization techniques that guarantee that a given character is always 
represented by the same code points, but that subject takes us too far off the 
topic for now. A later blog post will explain how the Go libraries address 
normalization.

"Code point" is a bit of a mouthful, so Go introduces a shorter term for the 
concept: rune. The term appears in the libraries and source code, and means 
exactly the same as "code point", with one interesting addition.

[quote/]

>
> -- gil
>

--
Windows. A funny name for a operating system that doesn't let you see anything.

Maranatha! <><
John McKown

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
[email protected] with the message: INFO IBM-MAIN

::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information 
could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in 
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on 
the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the 
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, 
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written 
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please 
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and 
other defects.

----------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: strcasecmp() comparing punctuation in ASCII?

Reply via email to