On 16 August 2015 at 18:35, Blake McBride <[email protected]> wrote:

> My own opinion:
>
> 1.  Very strongly -  *⍴,'A̲'*  has got to equal 1 no matter what !!
>

You may think so, but if you want to be consistent on that, you would have
to implement a completely new character set and abandon Unicode.

I'll give you an example. What would you want ⍴,'ä' to be?

Right now, that could return either 1 or 2 depending on whether the ä was
using the precomposed character (U+00E4) or the combining mark (U+0061,
U+0308). Visually, these are identical, and generally you'd expect them to
compare equal.

In Unicode, the comparison of equivalent (but with different characters)
strings are done by performing a normalisation step prior to comparison.
There are 4 different types of normalisation
<http://unicode.org/reports/tr15/>, with different behaviour.

Now, the ä character has a precomposed form in Unicode, and if you couple
that with the NFC normalisation form, you'd get the above expression to
return 1.

*However,* the reason for ä working is only because there is a precomposed
form available. The combining underline does not have that. So if you want
to suggest that the expression applied on an underlined character should
return 1, you *also* have to provide a suggestion as to what ⎕UCS X should
return. Remember that ⎕UCS has to satisfy (X=⎕UCS ⎕UCS X).

Regards,
Elias

Reply via email to