Re: need: scm_from_{utf8,latin1}_{string,symbol,keyword}

Mike Gran Mon, 06 Sep 2010 09:29:49 -0700

> From: Andy Wingo <wi...@pobox.com>


[...]

> The solution is to use  functions that specify the locale. We don't have
> those yet, but we do have  the capability to write them
> now. Specifically:
> 
>    scm_from_utf8_string
>   scm_from_utf8_symbol
>    scm_from_utf8_keyword
> 
>   scm_from_latin1_string
>    scm_from_latin1_symbol
>   scm_from_latin1_keyword
> 
> We probably also  need the "n" variants.
> 

[...]

> So then we need, I  think:
> 
>   scm_to_utf8_string
>   scm_to_utf16_string
>    scm_to_utf32_string
> 
> We need the "n" variants here too (perhaps  more).

Some of this is already in the bytevectors module, but, 
perhaps not in an easy form for C source code.

It would easy enough to do, but, there is a failure case to 
consider for scm_from_utf8_string.  The C utf8 string could
contain incorrectly encoded data.

You could throw the encoding error, or you could replace the 
bad utf8 with U+FFFD or the question mark.

The bytevector's utf8->string always throws encoding-error.
Maybe that's good enough.

Otherwise, perhaps something like

scm_from_utf8_stringn (str, len, error_or_replace_strategy)

If you didn't mind the overhead of calling the somewhat 
heavyweight scm_{to,from}_stringn, these could be macros
or inline functions that wrap that.

-Mike

Re: need: scm_from_{utf8,latin1}_{string,symbol,keyword}

Reply via email to