As far as I can recalled, it works this way:
if a rank-1 character vector is malformed uf8,
it converts the whole vector (not just those 
illegal characters) to byte by byte. in your
example the second row 
  224 176 157 97 99 224
is malformed because of the last 224, so it
convert to unicode in this way
     7 u: 224 176 157 97 99 224
ఝacà

After blanks inserted for format, it converted
back to utf8
     a. i. 8 u: 7 u: 224 176 157 97 99 224
195 160 194 176 194 157 97 99 195 160

So the round trip didn't look beautiful, ideally
it should convert only illegal subsequence , in
this case the last 224 to 195 160 to repair

   7 u: a.{~224 176 157 97 99 195 160
ఝacà
   a.i. 8 u: 7 u: a.{~224 176 157 97 99 195 160
224 176 157 97 99 195 160


Пн, 08 май 2017, Paul Jackson написал(а):
> I know the previous discussion concluded this wasn't worth fixing, but I've
> been looking at the causes of damaged output. I've confirmed that the
> visible appearance of truncated UTF8 characters are due to the environment.
>    a.i. v0=. 'cఝa'
> 99 224 176 157 97
>    224 { a.
> ʀ
>    224 176{a.
> ఊ
> 
> However, you can also see faults in what default format provides. As an APL
> developer, I assume it shares code with default output. My tests suggest
> embedded failures are due to J.
>    a.i.": <2 6$ v0
> 16 26 26 26 26 26 26 18 32 32 32 32
> 25 99 224 176 157 97 99 32 32 25 32 32
> 25 195 160 194 176 194 157 97 99 195 160 25
> 22 26 26 26 26 26 26 24 32 32 32 32
> 
> Note that there is no 195 160 in the text, and it seems 224 176 157 has
> become 194 176 195 157.  A further example of this behaviour is shown in
> the next unicode character.
>    a.i. v1=. 'cఞa'
> 99 224 176 158 97
>    a.i.": <2 6$ v1
> 16  26  26   26   26   26   26  18  32  32  32  32
> 25  99 224 176 158  97  99  32  32  25  32  32
> 25 195 160 194 176 194 158  97  99 195 160  25
> 22  26  26  26  26  26  26  24  32  32  32  32
> 
> While it should be possible to fix these internal mistakes, there cannot be
> a safe way to use verbs like
> # $ { {. .} } |. |:
> on UTF8 values, so I still don't know if it is worth fixing.
> 
> However, running these tests made me realize default format converts
> everything to UTF8. While the characters are not damaged by reshape, some
> rows of enclosed arrays will end in blanks.
>    a.i.": <2 4$7 u: v0
> 16  26  26  26  26  18  32  32  32  32
> 25  99 224 176 157  97  99  25  32  32
> 25 224 176 157  97  99 224 176 157  25
> 22  26  26  26  26  24  32  32  32  32
> -- 
> 
> Paul
> 650-766-1863
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

-- 
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to