Re: Wide strings status

Ludovic Courtès Wed, 22 Apr 2009 13:08:16 -0700

Hello!

Mike Gran <spk...@yahoo.com> writes:

> On Tue, 2009-04-21 at 23:37 +0200, Ludovic Courtès wrote:

>> You seem to imply that `scm_getc ()' will now return a Unicode
>> codepoint, is that right?  What about `scm_c_{read,write} ()', and
>> `scm_{get,put}s ()'?
>> 
>
> I vacillate on this, but, I think the most logical approach is to have
> scm_getc return codepoints and to have the rest of those functions
> return strings that could contain wide characters.

Hmm, `scm_c_{read,write} ()' are biased toward binary data, according to
the manual and to their prototype (they take `void *' buffers).  So I
would keep them this way.

`scm_puts ()' is more of a concern since it takes a `char *', which the
caller may consider an 8-bit-encoded, null-terminated string.  We should
probably deprecate it, and have it return an ISO-8859-1 string,
transcoding as necessary.

And `scm_gets ()' doesn't exist actually.  ;-)

> This is if and only
> if the port has been assigned a character encoding.  If it doesn't have
> an associated encoding, ports will be treated as de facto ISO-8859-1,
> where character values between 0 and 255 are stored without any
> interpretation and characters greater than 255 are invalid.  (Unicode
> codepoints 0 to 255 are by design the same as ISO-8859-1.)

OK.

Thanks,
Ludo'.

Re: Wide strings status

Reply via email to