German Umlauts / UTF8 with comparse

Christoph Lange Mon, 17 Feb 2020 05:35:09 -0800

I read older threads about parsing Japanese with comparse and took some
ideas from there, but am still stuck:



(import comparse utf8 utf8-srfi-14)

(define s "Gänsesäger 2,1")
(define s1 "Rotkehlchen 1,0")

(define (utf8-in cs)
  (satisfies (lambda (c) (char-set-contains? cs c))))

(define letter
  (utf8-in char-set:letter))

(define letters
  (as-string (repeated letter 1 20)))



This is what I have, and the beginning 'word' in the beginning of s1 is
parsed completely and correctly with the 'letters' parser:

#;1> (parse letters (string->list s1))
"Rotkehlchen"
#<parser-input lazy-seq #\space #\1 #\, #\0>
; 2 values


For 's' though I get this:


#;2> (parse letters (string->list s))
"G"
#<parser-input lazy-seq #\ #\n #\s #\e #\s #\ #\g #\e #\r #\space ...>
; 2 values



meaning, that the ä isn't recognized as being a letter within the
'char-set:letter'. (The UTF8 aspect of correct character width works on the
other hand: in the remaining string, the ä is represented by only one #\.
If I don't use the UTF8 string equivalents by importing 'utf8', it would be
two.)

Any hint for me?

/Christoph

-- 
Christoph Lange
Lotsarnas Väg 8
430 83 Vrångö

German Umlauts / UTF8 with comparse

Reply via email to