only 7 8 9 u: know utf8,utf16,utf32 encodings. All other u: work on J datatypes.
On Sun, Jun 9, 2019, 9:07 AM bill lam <[email protected]> wrote: > behavior if 3&u: is intended. It just returns the internal data > representation. > > To get unicode codepoint, both utf8 and utf16 need to converted to utf32 > using 9&u: as what you had tried. > > a=: 7&u: 100000 > #a > 2 > 3 u: a > 55329 56992 > 3 u: 9 u: a > 100000 > > On Sun, Jun 9, 2019, 8:35 AM Don Guinn <[email protected]> wrote: > >> 3&u: is not recognizing UTF8 codes. Reading the Dictionary where it says >> 3 integers char,literal2 or literal4 it is not clear if it recognizes >> UTF codes or only converts atoms to numbers. In this case when presented >> UTF8 it converts each byte to a number. But if this is the way it's >> supposed to work then there is no tool in u: to guarantee finding its >> numeric value or code point as even UTF32 as it has an escape range in it >> as well to go to multiple atoms. >> >> I guess what I'm looking for is a way to reliable to convert U8, U16 and >> U32 to integers. >> >> This is the same in J8 and J9. >> z=:'รท' >> >> #z >> >> 2 >> >> #7 u:z >> >> 1 >> >> #9 u:z >> >> 1 >> >> #3 u:z >> >> 2 >> >> 3 u: 9 u:z >> >> 247 >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
