Thanks, that explains a lot. Careful reading of your message made me realize it is a feature. If the characters are not UTF8, the code assumes you wanted to display bytes.
It should be simple to write a _7 u: which inserted 16bfffd wherever 7 u: would signal a domain error. On Sun, May 7, 2017, 10:39 PM bill lam <[email protected]> wrote: > As far as I can recalled, it works this way: > if a rank-1 character vector is malformed uf8, > it converts the whole vector (not just those > illegal characters) to byte by byte. in your > example the second row > 224 176 157 97 99 224 > is malformed because of the last 224, so it > convert to unicode in this way > 7 u: 224 176 157 97 99 224 > à° acà > > After blanks inserted for format, it converted > back to utf8 > a. i. 8 u: 7 u: 224 176 157 97 99 224 > 195 160 194 176 194 157 97 99 195 160 > > So the round trip didn't look beautiful, ideally > it should convert only illegal subsequence , in > this case the last 224 to 195 160 to repair > > 7 u: a.{~224 176 157 97 99 195 160 > ఝacà > a.i. 8 u: 7 u: a.{~224 176 157 97 99 195 160 > 224 176 157 97 99 195 160 > > > Пн, 08 май 2017, Paul Jackson написал(а): > > I know the previous discussion concluded this wasn't worth fixing, but > I've > > been looking at the causes of damaged output. I've confirmed that the > > visible appearance of truncated UTF8 characters are due to the > environment. > > a.i. v0=. 'cఝa' > > 99 224 176 157 97 > > 224 { a. > > ʀ > > 224 176{a. > > ఊ > > > > However, you can also see faults in what default format provides. As an > APL > > developer, I assume it shares code with default output. My tests suggest > > embedded failures are due to J. > > a.i.": <2 6$ v0 > > 16 26 26 26 26 26 26 18 32 32 32 32 > > 25 99 224 176 157 97 99 32 32 25 32 32 > > 25 195 160 194 176 194 157 97 99 195 160 25 > > 22 26 26 26 26 26 26 24 32 32 32 32 > > > > Note that there is no 195 160 in the text, and it seems 224 176 157 has > > become 194 176 195 157. A further example of this behaviour is shown in > > the next unicode character. > > a.i. v1=. 'cఞa' > > 99 224 176 158 97 > > a.i.": <2 6$ v1 > > 16 26 26 26 26 26 26 18 32 32 32 32 > > 25 99 224 176 158 97 99 32 32 25 32 32 > > 25 195 160 194 176 194 158 97 99 195 160 25 > > 22 26 26 26 26 26 26 24 32 32 32 32 > > > > While it should be possible to fix these internal mistakes, there cannot > be > > a safe way to use verbs like > > # $ { {. .} } |. |: > > on UTF8 values, so I still don't know if it is worth fixing. > > > > However, running these tests made me realize default format converts > > everything to UTF8. While the characters are not damaged by reshape, some > > rows of enclosed arrays will end in blanks. > > a.i.": <2 4$7 u: v0 > > 16 26 26 26 26 18 32 32 32 32 > > 25 99 224 176 157 97 99 25 32 32 > > 25 224 176 157 97 99 224 176 157 25 > > 22 26 26 26 26 24 32 32 32 32 > > -- > > > > Paul > > 650-766-1863 > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > -- > regards, > ==================================================== > GPG key 1024D/4434BAB3 2008-08-24 > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm -- Paul 650-766-1863 ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
