>>  >-Returns the number of characters in the given @var{string}.
>> +Returns the number of bytes in the given @var{string}.
>>  
>> This is false. For example, (string-length "πŸ˜€") is 1, whereas in all 
>> encodings I know of it is >more than one byte. Also, R5RS says: [...]
>
>Maybe `the number of codepoints` will work here.
>
>(string-length "πŸ‘¨β€πŸ­") ;; => 3
>(string-length "é") ;; => 2
>
>The number of characters here is 1 in both cases.

No, in Unicode (and Guile equates character=Unicode character) all characters 
correspond to a single codepoint.

You need to fix your setup, that’s not what Guile does. Are you sure you have 
set the encoding of current-input-port correctly? (Probably by setting LC_ALL 
or the like to a UTF-8 locale.) Otherwise the 3 bytes in the UTF-8 encoding 
might be interpreted in terms of some 8-bit encoding.

Here’s a test: if you can input #\πŸ‘¨β€πŸ­ without errors and it evaluates to #\πŸ‘¨β€πŸ­, 
then the encoding should be set up correctly.

Best regards,
Maxime Devos

Reply via email to