On 2024-06-26 13:46, Maxime Devos wrote: >>> >-Returns the number of characters in the given @var{string}. >>> +Returns the number of bytes in the given @var{string}. >>> >>> This is false. For example, (string-length "π") is 1, whereas in all >>> encodings I know of it is >more than one byte. Also, R5RS says: [...] >> >>Maybe `the number of codepoints` will work here. >> >>(string-length "π¨βπ") ;; => 3 >>(string-length "eΜ") ;; => 2 >> >>The number of characters here is 1 in both cases. > > No, in Unicode (and Guile equates character=Unicode character) all characters > correspond to a single codepoint. > > You need to fix your setup, thatβs not what Guile does. Are you sure you have > set the encoding of current-input-port correctly? (Probably by setting LC_ALL > or the like to a UTF-8 locale.) Otherwise the 3 bytes in the UTF-8 encoding > might be interpreted in terms of some 8-bit encoding. > > Hereβs a test: if you can input #\π¨βπ without errors and it evaluates to > #\π¨βπ, then the encoding should be set up correctly.
(setlocale LC_ALL) ;; => "en_US.utf8" (display #\π¨βπ) ;; => /home/bob/guile-ares-rs/dev/guile/tmp.scm:84:15: unknown character name π¨βπ The same hapenning if I do it in usual REPL: LC_ALL=en_US.utf8 guile -- Best regards, Andrew Tropin
signature.asc
Description: PGP signature