Re: [fpc-pascal] Parse unicode scalar

Nikolay Nikolov via fpc-pascal Sun, 02 Jul 2023 10:20:22 -0700

On 7/2/23 16:30, Hairy Pixels via fpc-pascal wrote:

I'm interested in parsing unicode scalars (I think they're called) to byte 
sized values but I'm not sure where to start. First thing I did was choose the 
unicode scalar U+1F496 (💖).


There's no such thing as "unicode scalar" in Unicode terminology:

https://unicode.org/glossary/

So, what do you mean? A Unicode code point? An Extended GraphemeCluster? Or something else? There are also several ways to encodeUnicode into a byte sequence - UTF-8, UTF-16LE, UTF-16BE, UTF-32, etc.


Next I cheated and ask ChatGPT. :) Amazingly from my question it was able to 
tell me the scaler is comprised of these 4 bytes:

  240 159 146 150

I was able to correctly concatenate these characters and writeln printed the 
correct character.

var
        s: String;
begin
s := char(240)+char(159)+char(146)+char(150);
writeln(s);
end.

The question is, how was 1F496 decomposed into 4 bytes?


I guess you should ask ChatGPT, who gave you the answer ;-)

Nikolay

_______________________________________________
fpc-pascal maillist  -  [email protected]
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Parse unicode scalar

Reply via email to