Yes, this helps. Kind of ;-) ... using the character set char-set:alphabetic, my umlauts are now parsed. But I don't get them back in my result, at least not as printable characters. Instead, the following happens, and utterly confuses me:
#;2> (define s3 (parse letters (string->list s))) #;3> s3 "Gnsesger" #;4> (string-length s3) 6 #;5> (string->list s3) (#\G #\x4bb3 #\e #\s #\x49e5 #\r) #;6> (list->string (string->list s3)) "G䮳es䧥r" So, I put the parse result into 's3'. Printing it, I read an eight character string, namely the one I want, minus my beloved umlauts. 'string-length' returns that string to be six characters long, and 'string->list' gives me exactly that, swallowing still other ASCII characters of my string and reversing that using 'list->string' includes Chinese ... even though '(list->string (string->list s1))', with my pure ASCII string, reverses without fault. I guess I have some problems understanding some utf8 concepts?! /Christoph On Mon, Feb 17, 2020 at 3:38 PM <[email protected]> wrote: > Christoph Lange <[email protected]> wrote: > > meaning, that the ä isn't recognized as being a letter within the > > 'char-set:letter'. > > The utf8 egg’s srfi-14 character sets are designed to be compatible with > the original srfi-14 and only contain ASCII characters, as stated in the > documentation: > https://wiki.call-cc.org/eggref/5/utf8#unicode-char-sets > “The default SRFI-14 char-sets are defined using ASCII-only characters” > > You might want to import the unicode-char-sets module, and use one of its > sets, like char-set:alphabetic. > > I hope this helps. :) > -- Christoph Lange Lotsarnas Väg 8 430 83 Vrångö
