'byte' is a representational implementation detail.  We have integers.

On Thu, 5 Jan 2023, bill lam wrote:

Unless J has separate datatype for byte and character as in java. J cannot
store strings as utf16.
The only sane choice is what is being used now and let users to bear the
burden.

On Thu, Jan 5, 2023 at 2:12 PM Elijah Stone <[email protected]> wrote:

And no one believes me when I say strings in j are irreperably broken and
need
to be thrown out and redesigned from scratch...

Your '♥♦♣♠' is intentionally (to borrow from kent pitman) a utf8-encoded
string, comprising 12 utf8 code units, where each aligned group of three
encodes a unicode code point representing a suit.  The j datatype
associated
therewith is 'literal', i.e., a sequence of octets.  The display of such
objects is literal, and your environment is (correctly) interpreting the
data
as utf-8 encoded.

10 u:y takes y an array of integers, however represented, and gives back
an
array of 'literal4' data of the same length, where each atom of the result
corresponds to one atom of the input.

Display of literal4 data assumes that they are ucs4-encoded, as you say,
and
further assumes that the environment is utf8-oriented, so je treats each
atom
of a literal4 as representing a code point, and encodes it as utf8.  In
other
words, your code _units_ are being cast as code _points_ (but note that 10
u:
itself does no interpretation).

9 u: applied to a literal array interprets it as utf8 and attempts to
decode
it, producing code points represented as literal4.  I expect this is what
you
are looking for.


On Thu, 5 Jan 2023, Raul Miller wrote:

>   10 u:'♥♦♣♠'
> ♥♦♣â™
>   #10 u:'♥♦♣♠'
> 12
>
> I can't make heads nor tails of this result.
>
> nuvoc suggests that 10 u: should be used to generate unicode4 (which
> probably means that it would use the ucs-4 encoding, containing a
> utf-32 representation of the argument characters), but while it's
> literally the case that the result is in J's unicode4 format:
>
>   datatype 10 u:'♥♦♣♠'
> unicode4
>
> ... it does not look like the argument characters were encoded in this
format.
>
> --
> Raul
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to