Hello,

Mike Gran <spk...@yahoo.com> writes:

> When you write...
>
> +  /* Create a copy of STR in the encoding of Z.  */
> +  buf = scm_to_stringn (str, &str_len, pt->encoding,
> +            SCM_FAILED_CONVERSION_ERROR);
> +  /* FIXME: strdup doesn't do the right thing if BUF contains zeros, but we
> +    don't know the size in bytes of STR.  */
> +  c_str = scm_gc_strdup (buf, "strport");
> +  free (buf);
>
> ... isn't the returned value str_len the length in bytes of buf?

The (undocumented) ‘scm_to_stringn ()’ returns the number of characters,
AFAICS.

> I think you could avoid the strdup call, since it could fail, for
> example, for UTF-32 strings of more than one character.

Yes, that sucks.  Probably we need a function to known the number of
bytes of a string.  Thoughts?

> Also, in the big scheme of things, I wonder if the name "string port"
> is misleading now.  Strings can contain the whole codepoint range.  
> But string ports can't store the whole range depending on their encoding.
> (That's what the "UTF-8" hack was about.)

Yes, it’s tricky.  The problem is that currently we can send both
textual and binary data to a given port (unlike the R6RS port API, which
judiciously distinguishes textual and binary ports.)  Because of that, I
think string ports can’t just use a fixed encoding.

What do you think?

Thanks,
Ludo’.


Reply via email to