And no one believes me when I say strings in j are irreperably broken and need to be thrown out and redesigned from scratch...

Your '♥♦♣♠' is intentionally (to borrow from kent pitman) a utf8-encoded string, comprising 12 utf8 code units, where each aligned group of three encodes a unicode code point representing a suit. The j datatype associated therewith is 'literal', i.e., a sequence of octets. The display of such objects is literal, and your environment is (correctly) interpreting the data as utf-8 encoded.

10 u:y takes y an array of integers, however represented, and gives back an array of 'literal4' data of the same length, where each atom of the result corresponds to one atom of the input.

Display of literal4 data assumes that they are ucs4-encoded, as you say, and further assumes that the environment is utf8-oriented, so je treats each atom of a literal4 as representing a code point, and encodes it as utf8. In other words, your code _units_ are being cast as code _points_ (but note that 10 u: itself does no interpretation).

9 u: applied to a literal array interprets it as utf8 and attempts to decode it, producing code points represented as literal4. I expect this is what you are looking for.


On Thu, 5 Jan 2023, Raul Miller wrote:

  10 u:'♥♦♣♠'
♥♦♣â™
  #10 u:'♥♦♣♠'
12

I can't make heads nor tails of this result.

nuvoc suggests that 10 u: should be used to generate unicode4 (which
probably means that it would use the ucs-4 encoding, containing a
utf-32 representation of the argument characters), but while it's
literally the case that the result is in J's unicode4 format:

  datatype 10 u:'♥♦♣♠'
unicode4

... it does not look like the argument characters were encoded in this format.

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to