Re: [Jbeta] 10 u: seems broken

Elijah Stone Wed, 04 Jan 2023 23:52:14 -0800

J neither has nor lacks 8-bit integers. If you're talking about theperformance implications, I addressed those in my proposal a few months ago.Essentially: let people pun characters as octets if they really want to, butstrongly discourage it.


On Thu, 5 Jan 2023, bill lam wrote:

But J doesn't have 8bit integers.

On Thu, Jan 5, 2023 at 3:28 PM Elijah Stone <[email protected]> wrote:

'byte' is a representational implementation detail.  We have integers.

On Thu, 5 Jan 2023, bill lam wrote:

> Unless J has separate datatype for byte and character as in java. J
cannot
> store strings as utf16.
> The only sane choice is what is being used now and let users to bear the
> burden.
>
> On Thu, Jan 5, 2023 at 2:12 PM Elijah Stone <[email protected]> wrote:
>
>> And no one believes me when I say strings in j are irreperably broken
and
>> need
>> to be thrown out and redesigned from scratch...
>>
>> Your '♥♦♣♠' is intentionally (to borrow from kent pitman) a utf8-encoded
>> string, comprising 12 utf8 code units, where each aligned group of three
>> encodes a unicode code point representing a suit.  The j datatype
>> associated
>> therewith is 'literal', i.e., a sequence of octets.  The display of such
>> objects is literal, and your environment is (correctly) interpreting the
>> data
>> as utf-8 encoded.
>>
>> 10 u:y takes y an array of integers, however represented, and gives back
>> an
>> array of 'literal4' data of the same length, where each atom of the
result
>> corresponds to one atom of the input.
>>
>> Display of literal4 data assumes that they are ucs4-encoded, as you say,
>> and
>> further assumes that the environment is utf8-oriented, so je treats each
>> atom
>> of a literal4 as representing a code point, and encodes it as utf8.  In
>> other
>> words, your code _units_ are being cast as code _points_ (but note that
10
>> u:
>> itself does no interpretation).
>>
>> 9 u: applied to a literal array interprets it as utf8 and attempts to
>> decode
>> it, producing code points represented as literal4.  I expect this is
what
>> you
>> are looking for.
>>
>>
>> On Thu, 5 Jan 2023, Raul Miller wrote:
>>
>> >   10 u:'♥♦♣♠'
>> > â™¥â™¦â™£â™
>> >   #10 u:'♥♦♣♠'
>> > 12
>> >
>> > I can't make heads nor tails of this result.
>> >
>> > nuvoc suggests that 10 u: should be used to generate unicode4 (which
>> > probably means that it would use the ucs-4 encoding, containing a
>> > utf-32 representation of the argument characters), but while it's
>> > literally the case that the result is in J's unicode4 format:
>> >
>> >   datatype 10 u:'♥♦♣♠'
>> > unicode4
>> >
>> > ... it does not look like the argument characters were encoded in this
>> format.
>> >
>> > --
>> > Raul
>> > ----------------------------------------------------------------------
>> > For information about J forums see
http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jbeta] 10 u: seems broken

Reply via email to