only 7 8 9 u: know utf8,utf16,utf32 encodings.

All other u: work on J datatypes.

On Sun, Jun 9, 2019, 9:07 AM bill lam <[email protected]> wrote:

> behavior if 3&u: is intended. It just returns the internal data
> representation.
>
> To get unicode codepoint, both utf8 and utf16 need to converted to utf32
> using 9&u: as what you had tried.
>
>   a=: 7&u: 100000
>    #a
> 2
>    3 u: a
> 55329 56992
>    3 u: 9 u: a
> 100000
>
> On Sun, Jun 9, 2019, 8:35 AM Don Guinn <[email protected]> wrote:
>
>> 3&u: is not recognizing UTF8 codes. Reading the Dictionary where it says
>>    3 integers char,literal2 or literal4 it is not clear if it recognizes
>> UTF codes or only converts atoms to numbers. In this case when presented
>> UTF8 it converts each byte to a number. But if this is the way it's
>> supposed to work then there is no tool in u: to guarantee finding its
>> numeric value or code point as even UTF32 as it has an escape range in it
>> as well to go to multiple atoms.
>>
>> I guess what I'm looking for is a way to reliable to convert U8, U16 and
>> U32 to integers.
>>
>> This is the same in J8 and J9.
>> z=:'รท'
>>
>> #z
>>
>> 2
>>
>> #7 u:z
>>
>> 1
>>
>> #9 u:z
>>
>> 1
>>
>> #3 u:z
>>
>> 2
>>
>> 3 u: 9 u:z
>>
>> 247
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to