Thanks, that explains a lot. Careful reading of your message made me
realize it is a feature.  If the characters are not UTF8, the code assumes
you wanted to display bytes.

It should be simple to write a _7 u: which inserted 16bfffd wherever
7 u: would signal a domain error.

On Sun, May 7, 2017, 10:39 PM bill lam <[email protected]> wrote:

> As far as I can recalled, it works this way:
> if a rank-1 character vector is malformed uf8,
> it converts the whole vector (not just those
> illegal characters) to byte by byte. in your
> example the second row
>   224 176 157 97 99 224
> is malformed because of the last 224, so it
> convert to unicode in this way
>      7 u: 224 176 157 97 99 224
> à° acà
>
> After blanks inserted for format, it converted
> back to utf8
>      a. i. 8 u: 7 u: 224 176 157 97 99 224
> 195 160 194 176 194 157 97 99 195 160
>
> So the round trip didn't look beautiful, ideally
> it should convert only illegal subsequence , in
> this case the last 224 to 195 160 to repair
>
>    7 u: a.{~224 176 157 97 99 195 160
> ఝacà
>    a.i. 8 u: 7 u: a.{~224 176 157 97 99 195 160
> 224 176 157 97 99 195 160
>
>
> Пн, 08 май 2017, Paul Jackson написал(а):
> > I know the previous discussion concluded this wasn't worth fixing, but
> I've
> > been looking at the causes of damaged output. I've confirmed that the
> > visible appearance of truncated UTF8 characters are due to the
> environment.
> >    a.i. v0=. 'cఝa'
> > 99 224 176 157 97
> >    224 { a.
> > ʀ
> >    224 176{a.
> > ఊ
> >
> > However, you can also see faults in what default format provides. As an
> APL
> > developer, I assume it shares code with default output. My tests suggest
> > embedded failures are due to J.
> >    a.i.": <2 6$ v0
> > 16 26 26 26 26 26 26 18 32 32 32 32
> > 25 99 224 176 157 97 99 32 32 25 32 32
> > 25 195 160 194 176 194 157 97 99 195 160 25
> > 22 26 26 26 26 26 26 24 32 32 32 32
> >
> > Note that there is no 195 160 in the text, and it seems 224 176 157 has
> > become 194 176 195 157.  A further example of this behaviour is shown in
> > the next unicode character.
> >    a.i. v1=. 'cఞa'
> > 99 224 176 158 97
> >    a.i.": <2 6$ v1
> > 16  26  26   26   26   26   26  18  32  32  32  32
> > 25  99 224 176 158  97  99  32  32  25  32  32
> > 25 195 160 194 176 194 158  97  99 195 160  25
> > 22  26  26  26  26  26  26  24  32  32  32  32
> >
> > While it should be possible to fix these internal mistakes, there cannot
> be
> > a safe way to use verbs like
> > # $ { {. .} } |. |:
> > on UTF8 values, so I still don't know if it is worth fixing.
> >
> > However, running these tests made me realize default format converts
> > everything to UTF8. While the characters are not damaged by reshape, some
> > rows of enclosed arrays will end in blanks.
> >    a.i.": <2 4$7 u: v0
> > 16  26  26  26  26  18  32  32  32  32
> > 25  99 224 176 157  97  99  25  32  32
> > 25 224 176 157  97  99 224 176 157  25
> > 22  26  26  26  26  24  32  32  32  32
> > --
> >
> > Paul
> > 650-766-1863
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> --
> regards,
> ====================================================
> GPG key 1024D/4434BAB3 2008-08-24
> gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
> gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

-- 

Paul
650-766-1863
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to