Hello, A few notes regarding the latest changes in `string-abstraction'.
"Michael Gran" <spk...@yahoo.com> writes: > --- a/libguile/read.c > +++ b/libguile/read.c > @@ -186,13 +186,42 @@ static inline SCM scm_read_scsh_block_comment (int chr, > SCM port); > /* Read from PORT until a delimiter (e.g., a whitespace) is read. Return > zero if the whole token fits in BUF, non-zero otherwise. */ > static inline int > -read_token (SCM port, char *buf, size_t buf_size, size_t *read) > +read_token (SCM port, scm_t_uint32 *buf, size_t buf_size, size_t *read) > { > *read = 0; > > while (*read < buf_size) > { > - int chr; > + scm_t_uint32 chr; > + > + chr = scm_getc (port); > + chr = (SCM_CASE_INSENSITIVE_P ? CHAR_DOWNCASE (chr) : chr); > + > + if (chr == (scm_t_uint32) EOF) > + return 0; > + else if (CHAR_IS_DELIMITER (chr)) > + { > + scm_ungetc (chr, port); > + return 0; > + } > + else > + { > + *buf = chr; > + buf++, (*read)++; > + } > + } > + > + return 1; > +} > + > +static inline int > +read_token2 (SCM port, char *buf, size_t buf_size, size_t *read) Is `read_token2 ()' temporary? (I didn't see it mentioned in the log.) Otherwise I'd vote for a more descriptive name. > static scm_t_uint32 BTW, I'm wondering whether we should have a separate type for wide chars, to make the intent clearer (say, `scm_t_char'). > --- a/test-suite/tests/ports.test > +++ b/test-suite/tests/ports.test > @@ -441,14 +441,14 @@ > (display "x\a" port) > (= 1 (port-column port)))) > > - (pass-if "\\x08 backspace" > + (pass-if "\\x08; backspace" > (let ((port (open-output-string))) > - (display "\x08" port) > + (display "\x08;" port) [...] > (define exception:illegal-escape > - (cons 'read-error "illegal character in escape sequence: .*$")) > + (cons 'read-error "invalid character in escape sequence: .*$")) [...] > (pass-if "CR recognized as a token delimiter" > ;; In 1.8.3, character 0x0d was not recognized as a delimiter. > - (equal? (read-string "one\x0dtwo") 'one)) > + (equal? (read-string "one\x0d;two") 'one)) We're introducing subtle incompatibilities here, which we'd rather avoid, by default at least. Could you make that a reader option? Likewise, I think we should avoid gratuitously changing exception messages since, unfortunately, they are sort-of part of the API. > --- a/test-suite/tests/regexp.test > +++ b/test-suite/tests/regexp.test > @@ -22,6 +22,8 @@ > #:use-module (test-suite lib) > #:use-module (ice-9 regex)) > > +(setlocale LC_ALL "en_US.iso88591") The user doesn't necessarily have this locale. Can you instead use something similar to `under-french-locale-or-unresolved' in `i18n.test' or `find-latin1-locale' in `srfi-14.test', perhaps moving it to the `(test-suite lib)' module? Thanks, Ludo'.