And no one believes me when I say strings in j are irreperably broken and need
to be thrown out and redesigned from scratch...
Your '♥♦♣♠' is intentionally (to borrow from kent pitman) a utf8-encoded
string, comprising 12 utf8 code units, where each aligned group of three
encodes a unicode code point representing a suit. The j datatype associated
therewith is 'literal', i.e., a sequence of octets. The display of such
objects is literal, and your environment is (correctly) interpreting the data
as utf-8 encoded.
10 u:y takes y an array of integers, however represented, and gives back an
array of 'literal4' data of the same length, where each atom of the result
corresponds to one atom of the input.
Display of literal4 data assumes that they are ucs4-encoded, as you say, and
further assumes that the environment is utf8-oriented, so je treats each atom
of a literal4 as representing a code point, and encodes it as utf8. In other
words, your code _units_ are being cast as code _points_ (but note that 10 u:
itself does no interpretation).
9 u: applied to a literal array interprets it as utf8 and attempts to decode
it, producing code points represented as literal4. I expect this is what you
are looking for.
On Thu, 5 Jan 2023, Raul Miller wrote:
10 u:'♥♦♣♠'
♥♦♣â™
#10 u:'♥♦♣♠'
12
I can't make heads nor tails of this result.
nuvoc suggests that 10 u: should be used to generate unicode4 (which
probably means that it would use the ucs-4 encoding, containing a
utf-32 representation of the argument characters), but while it's
literally the case that the result is in J's unicode4 format:
datatype 10 u:'♥♦♣♠'
unicode4
... it does not look like the argument characters were encoded in this format.
--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm