Re: [Jbeta] 10 u: seems broken

bill lam Wed, 04 Jan 2023 23:30:52 -0800

But J doesn't have 8bit integers.

On Thu, Jan 5, 2023 at 3:28 PM Elijah Stone <[email protected]> wrote:


> 'byte' is a representational implementation detail.  We have integers.
>
> On Thu, 5 Jan 2023, bill lam wrote:
>
> > Unless J has separate datatype for byte and character as in java. J
> cannot
> > store strings as utf16.
> > The only sane choice is what is being used now and let users to bear the
> > burden.
> >
> > On Thu, Jan 5, 2023 at 2:12 PM Elijah Stone <[email protected]> wrote:
> >
> >> And no one believes me when I say strings in j are irreperably broken
> and
> >> need
> >> to be thrown out and redesigned from scratch...
> >>
> >> Your '♥♦♣♠' is intentionally (to borrow from kent pitman) a utf8-encoded
> >> string, comprising 12 utf8 code units, where each aligned group of three
> >> encodes a unicode code point representing a suit.  The j datatype
> >> associated
> >> therewith is 'literal', i.e., a sequence of octets.  The display of such
> >> objects is literal, and your environment is (correctly) interpreting the
> >> data
> >> as utf-8 encoded.
> >>
> >> 10 u:y takes y an array of integers, however represented, and gives back
> >> an
> >> array of 'literal4' data of the same length, where each atom of the
> result
> >> corresponds to one atom of the input.
> >>
> >> Display of literal4 data assumes that they are ucs4-encoded, as you say,
> >> and
> >> further assumes that the environment is utf8-oriented, so je treats each
> >> atom
> >> of a literal4 as representing a code point, and encodes it as utf8.  In
> >> other
> >> words, your code _units_ are being cast as code _points_ (but note that
> 10
> >> u:
> >> itself does no interpretation).
> >>
> >> 9 u: applied to a literal array interprets it as utf8 and attempts to
> >> decode
> >> it, producing code points represented as literal4.  I expect this is
> what
> >> you
> >> are looking for.
> >>
> >>
> >> On Thu, 5 Jan 2023, Raul Miller wrote:
> >>
> >> >   10 u:'♥♦♣♠'
> >> > â™¥â™¦â™£â™
> >> >   #10 u:'♥♦♣♠'
> >> > 12
> >> >
> >> > I can't make heads nor tails of this result.
> >> >
> >> > nuvoc suggests that 10 u: should be used to generate unicode4 (which
> >> > probably means that it would use the ucs-4 encoding, containing a
> >> > utf-32 representation of the argument characters), but while it's
> >> > literally the case that the result is in J's unicode4 format:
> >> >
> >> >   datatype 10 u:'♥♦♣♠'
> >> > unicode4
> >> >
> >> > ... it does not look like the argument characters were encoded in this
> >> format.
> >> >
> >> > --
> >> > Raul
> >> > ----------------------------------------------------------------------
> >> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >>
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jbeta] 10 u: seems broken

Reply via email to