I think this should be the intended behavior. Your string is in utf8
unicode encoding and you can use 9 & u: to convert it into unicode4.
10 & u: converts atom by atom.

Convert to unicode4 unless all characters are ASCII 9 byte
Leave unchanged
unicode4 any character precision containing a UCPs > 127; or integer in
(0,16b10ffff) Convert to unicode4. Any UTF-8 is converted to unicode4, and
surrogate pairs in unicode are converted.
Convert to unicode4 10 unicode4 any character precision, or integer in
(0,16b10ffff) Convert to unicode4
------------------------------

On Thu, Jan 5, 2023 at 1:46 PM Raul Miller <[email protected]> wrote:

>    10 u:'♥♦♣♠'
> ♥♦♣â™
>    #10 u:'♥♦♣♠'
> 12
>
> I can't make heads nor tails of this result.
>
> nuvoc suggests that 10 u: should be used to generate unicode4 (which
> probably means that it would use the ucs-4 encoding, containing a
> utf-32 representation of the argument characters), but while it's
> literally the case that the result is in J's unicode4 format:
>
>    datatype 10 u:'♥♦♣♠'
> unicode4
>
> ... it does not look like the argument characters were encoded in this
> format.
>
> --
> Raul
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to