Sequential machine does not do well when dealing with UTF-8. It works well
within comments (NB.) and literals ('⌹'), but outside those cases it makes
a mess.


Given some of the changes to ;: in the beta it seems that it would be
desirable to have UTF-8 handled outside of comments and literals as handled
in them. There is a simple change that can be made to mj that accomplishes
that. Simply assigning the value 2 for letters for the range 128+i.128
accomplishes that making UTF-8 like letters a-z and A-Z.


I don't know where J will be going with UTF-8 and other unicode handling,
but this seems to me to help in the handling of UTF-8 in the sequential
machine.


Example shown below:

NB. Definitions for sj and mj not shown but as

NB. the current beta.

NB. A noun to show the handling of UTF-8 in ;:

test=:{{)n

The symbol for the Euro is ₠

Other symbols like π show up also

How about ⌹ in APL

Common expressions like 'H₂O' for water

Common expressions like H₂O for water

}}

NB. How ;: in beta handles it

,.<;.2(0;sj;mj);:test

+-----------------------------------------------+

|+---+------+---+---+----+--+-+-+-+-+ |

||The|symbol|for|the|Euro|is|â|‚| | | |

|+---+------+---+---+----+--+-+-+-+-+ |

+-----------------------------------------------+

|+-----+-------+----+-+-+----+--+----+-+ |

||Other|symools|like|Ï|€|show|up|also| | |

|+-----+-------+----+-+-+----+--+----+-+ |

+-----------------------------------------------+

|+---+-----+-+-+-+--+---+-+ |

||How|about|â|Œ|¹|in|APL| | |

|+---+-----+-+-+-+--+---+-+ |

+-----------------------------------------------+

|+------+-----------+----+-----+---+-----+-+ |

||Common|expressions|like|'H₂O'|for|water| | |

|+------+-----------+----+-----+---+-----+-+ |

+-----------------------------------------------+

|+------+-----------+----+-+-+-+-+-+---+-----+-+|

||Common|expressions|like|H|â|‚|‚|O|for|water| ||

|+------+-----------+----+-+-+-+-+-+---+-----+-+|

+-----------------------------------------------+

NB. Assigning UTF8 as character

mj=: 2 (128+i.128)}mj

NB. How UTF-8 is now handled

,.<;.2(0;sj;mj);:test

+-------------------------------------------+

|+---+------+---+---+----+--+-+-+ |

||The|symbol|for|the|Euro|is|₠| | |

|+---+------+---+---+----+--+-+-+ |

+-------------------------------------------+

|+-----+-------+----+-+----+--+----+-+ |

||Other|symools|like|π|show|up|also| | |

|+-----+-------+----+-+----+--+----+-+ |

+-------------------------------------------+

|+---+-----+-+--+---+-+ |

||How|about|⌹|in|APL| | |

|+---+-----+-+--+---+-+ |

+-------------------------------------------+

|+------+-----------+----+-----+---+-----+-+|

||Common|expressions|like|'H₂O'|for|water| ||

|+------+-----------+----+-----+---+-----+-+|

+-------------------------------------------+

|+------+-----------+----+---+---+-----+-+ |

||Common|expressions|like|H₂O|for|water| | |

|+------+-----------+----+---+---+-----+-+ |

+-------------------------------------------+
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to