On 07.05.2023 19:41, Phil Smith III wrote:
Seymour J Metz wrote:
I've seen Logical Not () at AA and at AC. Are there and ASCII-based
code pages that have it at a third position? Put another way, is there
a third code point that ooRexx and Regina should recognize as ?
And later:
UTF-8 is just a transform of Unicode, and the Unicode code point is
AC. The string C2AC is just a way of encoding AC.
Not quite. Yes, hex C2AC is the UTF-8 encoding of the Unicode NOT sign. Unicode
is a list of code points and, as you said, UTF-8 is an encoding. The Unicode
code point is U+00AC. It is NOT “AC”, nor “hex AC”. Yes, I’m being picky, but
this matters. The point is, U+00AC—the Unicode expression of that code
point—has a specific meaning, which then *must* be encoded somehow (UTF-8,
UTF-16, UTF-32); “AC” is meaningless in a Unicode context.
This is especially confusing since “plain ol’ ASCII” maps directly to the first
part of UTF-8-encoded Unicode. This is of course A Good Thing in general, but
lets people cheat and get away with it—until they don’t.
It gets even more confusing because ISO 8859-1 *looks* like Unicode in that,
for example, a hex AC is the NOT sign in 8859-1. But that’s 8859-1, not
Unicode, not UTF-8. A hex AC *is not a character* in UTF-8: it’s an error. I’ve
seen customers take data that’s UTF-8 and think it’s 8859-1. This mostly works.
“Mostly” is not good.
As for your original question, I’m more than willing to believe in some code
page with hex AA as the NOT sign, just never seen it. Hard to search for, too,
alas. Do you know what page that is?
I’m a bit chary* of blindly accepting multiple code points as NOT signs. Better
to know how your input is encoded (or mandate it). Unless, of course, it can be
demonstrated that this particular multilingualism cannot be misinterpreted.
...phsiii
*no “char” pun intended
It seems that there is also a wide not code point (Fullwidth not sign):
<https://codepoints.net/U+FFE2>.
also, if one clicks "show more" button in the "Representations" section at the top of
<https://codepoints.net/U+00AC?lang=en> then one gets a list of encodings of the not sign.
---rony
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN