Everybody wants to talk about handling APL characters. I'm for that too,
but first we need to make it clear on how to handle UTF-8 or UTF-whatever.
The problem I am trying to point out is that the characters in _128{.a fall
in a no-man's land. They are ambiguous. Sometimes they are treated like
8-bit extended ASCII. Sometimes they are treated like UTF-8 compression
characters.
u,U
þþ
shows how display got confused. Is it supposed to display UTF-8? Or is it
supposed to display 8-bit extended ASCII? Looks like it ran into an error
attempting to display it as UTF-8 so it switched to 8-bit extended ASCII.
": output is always literal. So
#":u,U
6
a.i.":u,U
195 131 194 190 195 190
switched all the 8-bit extended ASCII to UTF-8. But sometimes it just puts
in � when it can't figure out what to do. Maybe it should have displayed
the 8-bit extended ASCII instead. The trouble is that the character þ is
ambiguous.
The reason why 7 u: 254{a. is an error is because 7 u. specifically has
UTF-8 or ASCII as a right argument. 254{a. is neither. It is what I have
been calling 8-bit extended ASCII.
Before we can even hope to effectively deal with APL characters we need to
be very clear on how to handle UTF-8.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm