Ludovic Courtès <ludo <at> gnu.org> writes: > Yes, that's probably a good idea. At any rate, we only have > `scm_to_locale_string ()' currently so it's not too late to add a single > function with an encoding parameter in lieu of the proposed > `scm_to_{utf8,utf16,utf32,ucs4,...}_string ()'. > > But first of all, one needs to implement Unicode support.
FWIW, I have a complete unicode support library for Guile called GuICU. It lives at http://gano.sourceforge.net. It works for me, but, hasn't been widely tested. It is built on the large and cumbersome IBM ICU library. ICU encodes things internally as UTF16, which I always though of as a poor idea, since neither allows O(1) seeking of individual codepoints nor works so well with UTF-8. Based on my experience with ICU and putting this library together, and looking at what r6rs claims should be the future for Unicode, I really do think that UTF-32 is the way to go. Alternately, one could build a string library where strings are represented as either u8 or u32 vectors. If a string function is asked to operate on a u32 vector, it will assume a UTF32 encoding. If a string function is asked to operate on a u8 vector it will either require a locale or, as a fallback, treat the string as a raw byte vector. This would be twice the work to implement, though.