I understand what you are saying. However, rather then bend outcome to fit technical difficulty or complexity, I prefer to take whatever technical effort it takes to produce the desired outcome.
⍴,'A̲' and ⍴,'ä' should each produce exactly 1 regardless of the underlying technicalities. On Sun, Aug 16, 2015 at 5:49 AM, Elias Mårtenson <[email protected]> wrote: > On 16 August 2015 at 18:35, Blake McBride <[email protected]> wrote: > >> My own opinion: >> >> 1. Very strongly - *⍴,'A̲'* has got to equal 1 no matter what !! >> > > You may think so, but if you want to be consistent on that, you would have > to implement a completely new character set and abandon Unicode. > > I'll give you an example. What would you want ⍴,'ä' to be? > > Right now, that could return either 1 or 2 depending on whether the ä was > using the precomposed character (U+00E4) or the combining mark (U+0061, > U+0308). Visually, these are identical, and generally you'd expect them to > compare equal. > > In Unicode, the comparison of equivalent (but with different characters) > strings are done by performing a normalisation step prior to comparison. > There are 4 different types of normalisation > <http://unicode.org/reports/tr15/>, with different behaviour. > > Now, the ä character has a precomposed form in Unicode, and if you couple > that with the NFC normalisation form, you'd get the above expression to > return 1. > > *However,* the reason for ä working is only because there is a > precomposed form available. The combining underline does not have that. So > if you want to suggest that the expression applied on an underlined > character should return 1, you *also* have to provide a suggestion as to > what ⎕UCS X should return. Remember that ⎕UCS has to satisfy (X=⎕UCS ⎕UCS > X). > > Regards, > Elias >
