Ludovic Courtès <ludo <at> gnu.org> writes:

> Yes, that's probably a good idea.  At any rate, we only have
> `scm_to_locale_string ()' currently so it's not too late to add a single
> function with an encoding parameter in lieu of the proposed
> `scm_to_{utf8,utf16,utf32,ucs4,...}_string ()'.
> 
> But first of all, one needs to implement Unicode support.  

FWIW, I have a complete unicode support library for Guile called GuICU.  It 
lives at http://gano.sourceforge.net.  It works for me, but, hasn't been 
widely tested.

It is built on the large and cumbersome IBM ICU library.  ICU encodes things 
internally as UTF16, which I always though of as a poor idea, since neither 
allows O(1) seeking of individual codepoints nor works so well with UTF-8.

Based on my experience with ICU and putting this library together, and looking 
at what r6rs claims should be the future for Unicode, I really do think that 
UTF-32 is the way to go. 

Alternately, one could build a string library where strings are represented as 
either u8 or u32 vectors.  If a string function is asked to operate on a u32 
vector, it will assume a UTF32 encoding.  If a string function is asked to 
operate on a u8 vector it will either require a locale or, as a fallback, 
treat the string as a raw byte vector.

This would be twice the work to implement, though.




Reply via email to