() [EMAIL PROTECTED] (Ludovic Courtès) () Tue, 10 Jun 2008 14:09:33 +0200 Currently, Guile only supports `scm_to_locale_string ()', which means the returned C string is encoded in the current locale's encoding. Eventually, new functions may be added: `scm_to_utf8_string ()', etc. This was Marius' original plan [0], and I think it remains valid.
Most plans are "valid" but not all plans are easy to live with. I think the encoding of a string (or buffer or "character" array (or subsequence thereof)) needs to be explicit; the encoding is not purely "internal" and to treat it as such will require hoop- jumping on both sides of the API. (How encoding support is implemented, on the other hand, is indeed an internal affair.) This is from observation of how Emacs attained multibyte-ness. Note: not just "how Emacs does it" but "how Emacs used to not do it and through time eventually came to do it". In PostgreSQL's multibyte support, the i/o can be tempered by setting the "client encoding". This can be changed cheaply (per request). Basing encoding on locale only is not fine-grained enough; setting the locale can be expensive and cause unrelated changes. See also GNU libc support (info "(libc) Character Set Handling"), which applies similar principles at a lower (library) level. All these programs chose not to expose many conversion functions in the programming interface. Instead, they expose few functions, each with an encoding parameter. That is surely a cleaner design. thi