What are sj/mj? They are not part of the JE, right? I think you are
saying that (x ;: y) can be used to parse well-formed UTF-8 with a small
change to the input translation.
Henry Rich
On 11/20/2020 10:17 AM, Don Guinn wrote:
Sequential machine does not do well when dealing with UTF-8. It works well
within comments (NB.) and literals ('⌹'), but outside those cases it makes
a mess.
Given some of the changes to ;: in the beta it seems that it would be
desirable to have UTF-8 handled outside of comments and literals as handled
in them. There is a simple change that can be made to mj that accomplishes
that. Simply assigning the value 2 for letters for the range 128+i.128
accomplishes that making UTF-8 like letters a-z and A-Z.
I don't know where J will be going with UTF-8 and other unicode handling,
but this seems to me to help in the handling of UTF-8 in the sequential
machine.
Example shown below:
NB. Definitions for sj and mj not shown but as
NB. the current beta.
NB. A noun to show the handling of UTF-8 in ;:
test=:{{)n
The symbol for the Euro is ₠
Other symbols like π show up also
How about ⌹ in APL
Common expressions like 'H₂O' for water
Common expressions like H₂O for water
}}
NB. How ;: in beta handles it
,.<;.2(0;sj;mj);:test
+-----------------------------------------------+
|+---+------+---+---+----+--+-+-+-+-+ |
||The|symbol|for|the|Euro|is|â|‚| | | |
|+---+------+---+---+----+--+-+-+-+-+ |
+-----------------------------------------------+
|+-----+-------+----+-+-+----+--+----+-+ |
||Other|symools|like|Ï|€|show|up|also| | |
|+-----+-------+----+-+-+----+--+----+-+ |
+-----------------------------------------------+
|+---+-----+-+-+-+--+---+-+ |
||How|about|â|Œ|¹|in|APL| | |
|+---+-----+-+-+-+--+---+-+ |
+-----------------------------------------------+
|+------+-----------+----+-----+---+-----+-+ |
||Common|expressions|like|'H₂O'|for|water| | |
|+------+-----------+----+-----+---+-----+-+ |
+-----------------------------------------------+
|+------+-----------+----+-+-+-+-+-+---+-----+-+|
||Common|expressions|like|H|â|‚|‚|O|for|water| ||
|+------+-----------+----+-+-+-+-+-+---+-----+-+|
+-----------------------------------------------+
NB. Assigning UTF8 as character
mj=: 2 (128+i.128)}mj
NB. How UTF-8 is now handled
,.<;.2(0;sj;mj);:test
+-------------------------------------------+
|+---+------+---+---+----+--+-+-+ |
||The|symbol|for|the|Euro|is|₠| | |
|+---+------+---+---+----+--+-+-+ |
+-------------------------------------------+
|+-----+-------+----+-+----+--+----+-+ |
||Other|symools|like|π|show|up|also| | |
|+-----+-------+----+-+----+--+----+-+ |
+-------------------------------------------+
|+---+-----+-+--+---+-+ |
||How|about|⌹|in|APL| | |
|+---+-----+-+--+---+-+ |
+-------------------------------------------+
|+------+-----------+----+-----+---+-----+-+|
||Common|expressions|like|'H₂O'|for|water| ||
|+------+-----------+----+-----+---+-----+-+|
+-------------------------------------------+
|+------+-----------+----+---+---+-----+-+ |
||Common|expressions|like|H₂O|for|water| | |
|+------+-----------+----+---+---+-----+-+ |
+-------------------------------------------+
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
--
This email has been checked for viruses by AVG.
https://www.avg.com
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm