Bill has hit the nail on the head.

For unboxed display, JE just sends the bytes to the front-end which does whatever it wants with invalid UTF-8 sequences.

For boxed display, in order to get the boxing right JE has to predict the number of character positions that will be taken up by each Unicode character, so it converts anything that looks like UTF-8 to Unicode characters. Whatever it chooses to do with invalid UTF-8 might be different from what the front-end does.

My take on this is that it's not worth trying to fix.

Henry Rich

On 4/23/2017 3:54 AM, bill lam wrote:
The difficulty is converting an invalid utf8 to a unicode
and then converting it back to the original invalid utf8.

The display of invalid sequence depends on front-end's font
engine.

I hope you can bear with it.

Вс, 23 апр 2017, robert therriault написал(а):
Thanks Bill,

I thought it may be related and I also suspected that it might not be an easy 
fix.

It does result in some strange situations where a UTF-8 sequence 224 176 157 is 
interpreted one way in the first row of an array and differently in the second 
row. I suppose that is the nature of UTF-8 shards. It is a messy business.

      <2 6  $ 'cఝa'
┌──────┐
│cఝac  │
│ఝacà│
└──────┘

Cheers, bob
On Apr 23, 2017, at 12:05 AM, bill lam <[email protected]> wrote:

_3 s: 2 5  $ 'cb鲨a'
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm


---
This email has been checked for viruses by AVG.
http://www.avg.com

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to