Yes, Bill. It simply treats each UTF-8 byte as a letter keeping the parts of the UTF-8 character code together. ;: normally treats each byte of a UTF-8 character as a special character like + or -. Then strange things happen sometimes resulting in an invalid character displayed or guessing the equivalent unicode point for the one byte as before unicode existed. It's not very desirable. The bytes of UTF-8 in a comment or literal are kept together giving the expected result. This simply makes the UTF-8 code bytes work the same way outside of comments or literals.
As I stated earlier, I don't know where J intends to go with unicode. This could open up the discussion of support of APL characters as primitives again. I hope not. Personally I like the strictly ASCII definitions of primitives. But I noticed in the Android version of J it supports UTF-8 names like setting iota to i. and using it as a named verb. Not sure what to think of that. Unicode points are a mixture of characters in many languages tokens and symbols. I don't see any order in distinguishing between them. One of the things that the support of line feeds in ;: provides it to process all kinds of non-J data which might include UTF-8 not as part of a literal or J comment. Adding this support for UTF-8 might make ;: more useful for such data. Unicode and UTF-x are problematic to deal with in J as UTF-x codes may take more than one item and J does not deal with multiple items for a character. When I deal with text that may include UTF-8 as from the internet I immediately convert it to UTF-16 or UTF-32 hoping to avoid multiple items representing a character. To me that is much easier than having to code around multiple items representing a character. This seemed to me as a simple way to get ;: to handle UTF-8 as one might expect. Just a thought. ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
