Re: [Jbeta] 10 u: seems broken

bill lam Wed, 04 Jan 2023 22:54:00 -0800

Unless J has separate datatype for byte and character as in java. J cannot
store strings as utf16.
The only sane choice is what is being used now and let users to bear the
burden.


On Thu, Jan 5, 2023 at 2:12 PM Elijah Stone <[email protected]> wrote:

> And no one believes me when I say strings in j are irreperably broken and
> need
> to be thrown out and redesigned from scratch...
>
> Your '♥♦♣♠' is intentionally (to borrow from kent pitman) a utf8-encoded
> string, comprising 12 utf8 code units, where each aligned group of three
> encodes a unicode code point representing a suit.  The j datatype
> associated
> therewith is 'literal', i.e., a sequence of octets.  The display of such
> objects is literal, and your environment is (correctly) interpreting the
> data
> as utf-8 encoded.
>
> 10 u:y takes y an array of integers, however represented, and gives back
> an
> array of 'literal4' data of the same length, where each atom of the result
> corresponds to one atom of the input.
>
> Display of literal4 data assumes that they are ucs4-encoded, as you say,
> and
> further assumes that the environment is utf8-oriented, so je treats each
> atom
> of a literal4 as representing a code point, and encodes it as utf8.  In
> other
> words, your code _units_ are being cast as code _points_ (but note that 10
> u:
> itself does no interpretation).
>
> 9 u: applied to a literal array interprets it as utf8 and attempts to
> decode
> it, producing code points represented as literal4.  I expect this is what
> you
> are looking for.
>
>
> On Thu, 5 Jan 2023, Raul Miller wrote:
>
> >   10 u:'♥♦♣♠'
> > â™¥â™¦â™£â™
> >   #10 u:'♥♦♣♠'
> > 12
> >
> > I can't make heads nor tails of this result.
> >
> > nuvoc suggests that 10 u: should be used to generate unicode4 (which
> > probably means that it would use the ucs-4 encoding, containing a
> > utf-32 representation of the argument characters), but while it's
> > literally the case that the result is in J's unicode4 format:
> >
> >   datatype 10 u:'♥♦♣♠'
> > unicode4
> >
> > ... it does not look like the argument characters were encoded in this
> format.
> >
> > --
> > Raul
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jbeta] 10 u: seems broken

Reply via email to