And more -------- Original Message -------- Subject: Re: [Jchat] APL character support - skins Date: Thu, 27 Feb 2014 17:05:57 -0500 From: Robert Bernecky <[email protected]> To: [email protected]
I have been trying to peddle that same idea for about twenty years now, without people getting excited about it. Basically, rather than get into arguments, oops I mean "discussions" about what glyph maps to what, the way I would do it is as a "skin", much like the skins you can wrap around audio-playing apps and the like: If you don't like ASCII, then map the display and entry to an APLish view. Or perhaps to a drag-and-drop, connect the primitives such as do some AV studio patchboard apps. Bob On 14-02-27 04:56 PM, Skip Cave wrote:
Just my two cents worth... As an old APL (occasional) programmer, I always wanted a way to flip a switch in the J editor and turn J's 2-character primitives into APL characters (where appropriate), and either leave J's unique verbs alone, have the community decide on an appropriate single glyph, or let me pick a symbol for those myself. Then I could always flip that switch in the editor back, and see the actual J code, any time I wanted. For me, it was never about how many characters I had to type. It was about what I saw, when I looked at the code. IMHO, the APL single glyphs just made the functionality of programs much easier to grasp as I read through them. If I am entering code and the switch was in APL mode, I could just type the actual J 2-character primitives, and the one-character APL symbol would appear on the screen. When sending code around, I can always send the normal ASCII J representation (like sending the compiled binaries of a program), and the receiver of the code would have the option of looking at the J code in its native form, or viewing the APL-like symbols. I'm sure this plan has many (undiscovered by me) flaws, but it is my dream... Skip Skip Cave Cave Consulting LLC On Thu, Feb 27, 2014 at 1:03 PM, Don Guinn<[email protected]> wrote:This discussion started out on using APL characters as executable in J. I'm not sure I would want to make many equivalences between APL symbols and J primitives; however, representing APL characters and international characters gets into the way J handles these characters with the character types literal, unicode and UTF-8. Those not interested bail out now as the rest is kind of boring, but my soap-box. About the time mini-computers and personal computers became common 7-bit ASCII was well-established standard. But since by this time computers had standardized on 8 bits to the character. This extra bit allowed for supporting international characters and still fit in the byte. In addition, APL used those extra characters to support APL characters. But this lead to confusion since those characters varied between countries and systems. Unicode was created to attempt to clean this mess up. It took the 7-bit ASCII and a fairly accepted version of the 8-bit version of extended ASCII and added leading zeros up to 32 bits. Now there is all kinds of room to support many languages in a compatible manner. Enter UCS Transformation Format, in particular UTF-8. There are many problems with Unicode as it made ASCII files much larger and take longer to send over slow communications lines. And there is the endian issue between different computers. UTF-8 is an ingenious technique to compress unicode in a manner that is completely compatible with 7-bit ASCII. The endian problem is eliminated. It is not compatible with 8-bit ASCII extensions. 7-bit ASCII text looks identical to UTF-8 text. The 8-bit ASCII extensions text does not. Those characters become two bytes each using the UTF-8 compression algorithm. J converts literal to unicode by simply putting a zero byte in front extending it to the the 16-bit version of Unicode implemented in Windows and Unix. This is perfectly valid as the numeric values of the first 256 Unicode letters match the 8-bit ASCII extension. UTF-8 assumes that _128{.a. characters in literal are used in the compression algorithm. That they do not represent extended ASCII. But J treats UTF-8 as literal making it impossible to tell if those characters represent extended ASCII or UTF-8 compression. UTF-8 is a compressed version of Unicode that J fits in literal. J treats literal as 8-bit extended ASCII when combining and converting to/from unicode (wide). It treats literal as UTF-8 when entered from the keyboard and displayed. Got a bit of an inconsistency here. U =: 7 u: u =: 'þ' 3!:0 u NB. u is literal 2 3!:0 U NB. U is unicode 131072 #u NB. u takes 2 atoms 2 #U NB. U takes 1 atom 1 'abc',u NB. ASCII literals catenate with UTF-8 abcþ 'abc',U NB. ASCII literals catenate with unicode abcþ u,U NB. UTF-8 literals do not catenate well with unicode þþ a.i.u,U NB. Here we have þ in two forms 195 190 254 So, when programming in J one must never mix UTF-8 and unicode without being extremely careful and aware of what can happen. It is easiest to use ASCII and UTF-8 together. Not a problem as one cannot get any unicode into J without specifically converting to unicode using u: . The alternative is to make sure all text that might contain UTF-8 is converted to unicode. That can be difficult at times. The trouble with mixing ASCII and UTF-8 is that J primitives work on the atoms of literal. Any UTF-8 are treated as 8-bit extended ASCII. Counting characters and reshaping fail with UTF-8. Searching for UTF-8 characters is harder. An example of a failure character counting with UTF-8 is the displaying of boxed literals. <u +--+ |þ| +--+ Notice that þ is treated as two characters but displays as one. I choose to make sure everything that might contain UTF-8 is run through 7 u: which will convert it unicode if it contains any UTF-8 or it leaves it literal otherwise. Now all the J primitives work as expected. A character fits in an atom. I never worry about the possibility of UTF-8 characters being garbled. When I'm through, simply convert my final result back to UTF-8. ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
-- Robert Bernecky Snake Island Research Inc 18 Fifth Street Ward's Island Toronto, Ontario M5J 2B9 [email protected] tel: +1 416 203 0854 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
