Hello, Mike Gran <spk...@yahoo.com> writes:
> When you write... > > + /* Create a copy of STR in the encoding of Z. */ > + buf = scm_to_stringn (str, &str_len, pt->encoding, > + SCM_FAILED_CONVERSION_ERROR); > + /* FIXME: strdup doesn't do the right thing if BUF contains zeros, but we > + don't know the size in bytes of STR. */ > + c_str = scm_gc_strdup (buf, "strport"); > + free (buf); > > ... isn't the returned value str_len the length in bytes of buf? The (undocumented) ‘scm_to_stringn ()’ returns the number of characters, AFAICS. > I think you could avoid the strdup call, since it could fail, for > example, for UTF-32 strings of more than one character. Yes, that sucks. Probably we need a function to known the number of bytes of a string. Thoughts? > Also, in the big scheme of things, I wonder if the name "string port" > is misleading now. Strings can contain the whole codepoint range. > But string ports can't store the whole range depending on their encoding. > (That's what the "UTF-8" hack was about.) Yes, it’s tricky. The problem is that currently we can send both textual and binary data to a given port (unlike the R6RS port API, which judiciously distinguishes textual and binary ports.) Because of that, I think string ports can’t just use a fixed encoding. What do you think? Thanks, Ludo’.