And more

-------- Original Message --------
Subject:        Re: [Jchat] APL character support - skins
Date:   Thu, 27 Feb 2014 17:05:57 -0500
From:   Robert Bernecky <[email protected]>
To:     [email protected]



I have been trying to peddle that same idea for about
twenty years now, without people getting excited about it.

Basically, rather than get into arguments, oops I mean
"discussions" about what glyph maps to what, the way I would
do it is as a "skin", much like the skins you can wrap around
audio-playing apps and the like: If you don't like ASCII,
then map the display and entry to an APLish view.
Or perhaps to a drag-and-drop, connect the primitives
such as do some AV studio patchboard apps.

Bob

On 14-02-27 04:56 PM, Skip Cave wrote:
 Just my two cents worth...

 As an old APL (occasional) programmer, I always wanted a way to flip a
 switch in the J editor and turn J's 2-character primitives into APL
 characters (where appropriate), and either leave J's unique verbs alone,
 have the community decide on an appropriate single glyph, or let me pick a
 symbol for those myself. Then I could always flip that switch in the editor
 back, and see the actual J code, any time I wanted.

 For me, it was never about how many characters I had to type. It was about
 what I saw, when I looked at the code. IMHO, the APL single glyphs just
 made the functionality of programs much easier to grasp as I read through
 them.

   If I am entering code and the switch was in APL mode, I could just type
 the actual J 2-character primitives, and the one-character APL symbol would
 appear on the screen.

 When sending code around, I can always send the normal ASCII J
 representation (like sending the compiled binaries of a program), and the
 receiver of the code would have the option of looking at the J code in its
 native form, or viewing the APL-like symbols.

 I'm sure this plan has many (undiscovered by me) flaws, but it is my
 dream...

 Skip



 Skip Cave
 Cave Consulting LLC


 On Thu, Feb 27, 2014 at 1:03 PM, Don Guinn<[email protected]>   wrote:

 This discussion started out on using APL characters as executable in J. I'm
 not sure I would want to make many equivalences between APL symbols and J
 primitives; however, representing APL characters and international
 characters gets into the way J handles these characters with the character
 types literal, unicode and UTF-8.

 Those not interested bail out now as the rest is kind of boring, but my
 soap-box.

 About the time mini-computers and personal computers became common 7-bit
 ASCII was well-established standard. But since by this time computers had
 standardized on 8 bits to the character. This extra bit allowed for
 supporting international characters and still fit in the byte. In addition,
 APL used those extra characters to support APL characters. But this lead to
 confusion since those characters varied between countries and systems.

 Unicode was created to attempt to clean this mess up. It took the 7-bit
 ASCII and a fairly accepted version of the 8-bit version of extended ASCII
 and added leading zeros up to 32 bits. Now there is all kinds of room to
 support many languages in a compatible manner.

 Enter UCS Transformation Format, in particular UTF-8. There are many
 problems with Unicode as it made ASCII files much larger and take longer to
 send over slow communications lines. And there is the endian issue between
 different computers. UTF-8 is an ingenious technique to compress unicode in
 a manner that is completely compatible with 7-bit ASCII. The endian problem
 is eliminated. It is not compatible with 8-bit ASCII extensions. 7-bit
 ASCII text looks identical to UTF-8 text. The 8-bit ASCII extensions text
 does not. Those characters become two bytes each using the UTF-8
 compression algorithm.

 J converts literal to unicode by simply putting a zero byte in front
 extending it to the the 16-bit version of Unicode implemented in Windows
 and Unix. This is perfectly valid as the numeric values of the first 256
 Unicode letters match the 8-bit ASCII extension. UTF-8 assumes that
 _128{.a. characters in literal are used in the compression algorithm. That
 they do not represent extended ASCII. But J treats UTF-8 as literal making
 it impossible to tell if those characters represent extended ASCII or UTF-8
 compression.

 UTF-8 is a compressed version of Unicode that J fits in literal. J treats
 literal as 8-bit extended ASCII when combining and converting to/from
 unicode (wide). It treats literal as UTF-8 when entered from the keyboard
 and displayed. Got a bit of an inconsistency here.

     U =: 7 u: u =: 'þ'

     3!:0 u   NB. u is literal

 2

     3!:0 U   NB. U is unicode

 131072

     #u       NB. u takes 2 atoms

 2

     #U       NB. U takes 1 atom

 1

     'abc',u  NB. ASCII literals catenate with UTF-8

 abcþ

     'abc',U  NB. ASCII literals catenate with unicode

 abcþ

     u,U      NB. UTF-8 literals do not catenate well with unicode

 þþ

     a.i.u,U  NB. Here we have þ in two forms

 195 190 254

 So, when programming in J one must never mix UTF-8 and unicode without
 being extremely careful and aware of what can happen. It is easiest to use
 ASCII and UTF-8 together. Not a problem as one cannot get any unicode into
 J without specifically converting to unicode using u: .

 The alternative is to make sure all text that might contain UTF-8 is
 converted to unicode. That can be difficult at times.

 The trouble with mixing ASCII and UTF-8 is that J primitives work on the
 atoms of literal. Any UTF-8 are treated as 8-bit extended ASCII. Counting
 characters and reshaping fail with UTF-8. Searching for UTF-8 characters is
 harder. An example of a failure character counting with UTF-8 is the
 displaying of boxed literals.

     <u

 +--+

 |þ|

 +--+
 Notice that þ is treated as two characters but displays as one.

 I choose to make sure everything that might contain UTF-8 is run through 7
 u: which will convert it unicode if it contains any UTF-8 or it leaves it
 literal otherwise. Now all the J primitives work as expected. A character
 fits in an atom. I never worry about the possibility of UTF-8 characters
 being garbled. When I'm through, simply convert my final result back to
 UTF-8.
 ----------------------------------------------------------------------
 For information about J forums see http://www.jsoftware.com/forums.htm
 ----------------------------------------------------------------------
 For information about J forums see http://www.jsoftware.com/forums.htm



--
Robert Bernecky
Snake Island Research Inc
18 Fifth Street
Ward's Island
Toronto, Ontario M5J 2B9

[email protected]
tel: +1 416 203 0854



----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to