Bill has hit the nail on the head.
For unboxed display, JE just sends the bytes to the front-end which does
whatever it wants with invalid UTF-8 sequences.
For boxed display, in order to get the boxing right JE has to predict
the number of character positions that will be taken up by each Unicode
character, so it converts anything that looks like UTF-8 to Unicode
characters. Whatever it chooses to do with invalid UTF-8 might be
different from what the front-end does.
My take on this is that it's not worth trying to fix.
Henry Rich
On 4/23/2017 3:54 AM, bill lam wrote:
The difficulty is converting an invalid utf8 to a unicode
and then converting it back to the original invalid utf8.
The display of invalid sequence depends on front-end's font
engine.
I hope you can bear with it.
Вс, 23 апр 2017, robert therriault написал(а):
Thanks Bill,
I thought it may be related and I also suspected that it might not be an easy
fix.
It does result in some strange situations where a UTF-8 sequence 224 176 157 is
interpreted one way in the first row of an array and differently in the second
row. I suppose that is the nature of UTF-8 shards. It is a messy business.
<2 6 $ 'cఝa'
┌──────┐
│cఝac │
│à°acà│
└──────┘
Cheers, bob
On Apr 23, 2017, at 12:05 AM, bill lam <[email protected]> wrote:
_3 s: 2 5 $ 'cb鲨a'
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
---
This email has been checked for viruses by AVG.
http://www.avg.com
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm