Bruce Korb <bk...@gnu.org> writes: > On 01/07/12 08:13, Mark H Weaver wrote: >>> Most of the strings that I wind up altering are created with a >>> scm_from_locale_string() C function call. >> >> BTW, beware that scm_from_locale_string() is only appropriate for >> strings that came from the user (e.g. command-line arguments, reading >> from a port, etc). When converting string literals from your own source >> code, you should use scm_from_latin1_string() or scm_from_utf8_string(). >> >> Similarly, to make symbols from C string literals, use >> scm_from_latin1_symbol() or scm_from_utf8_symbol(). >> >> Caveat: these functions did not exist in Guile 1.8. If your C string >> literals are ASCII-only, I guess it won't matter in practice which >> function you use, although it would be good to spread the understanding >> that C string literals should not be interpreted according to the user's >> locale. > > I go back to my argument that a facilitation language needs to focus > on being as helpful as possible. That means doing what is likely > wanted instead of throwing errors at every possibility. It also means > not changing interfaces.
Sorry, but there's no way to maintain backward compatibility here. I know it's a pain, but there's no getting around the fact that in order to write proper internationalized code, we now need to think carefully about what encoding a particular string is in. There's no automatic way to handle this, not even in principle. Fortunately, most modern GNU/Linux systems default to a UTF-8 locale, in which case scm_from_locale_string and scm_from_utf8_string will be the same anyway. However, there are still some systems that use a non-UTF-8 locale, and we must strive to support them properly. > Anyway, this then? (abbreviated) > > #if GUILE_VERSION < 107000 > # define AG_SCM_STR02SCM(_s) scm_makfrom0str(_s) > # define AG_SCM_STR2SCM(_st,_sz) scm_mem2string(_st,_sz) > > #elif GUILE_VERSION < 200000 > # define AG_SCM_STR02SCM(_s) scm_from_locale_string(_s) > # define AG_SCM_STR2SCM(_st,_sz) scm_from_locale_stringn(_st,_sz) > > #elif GUILE_VERSION < 200004 > #error "autogen does not work with this version of guile" > choke me. This last clause is wrong. scm_from_utf8_string and scm_from_utf8_stringn were in Guile 2.0.0. > #else > # define AG_SCM_STR02SCM(_s) scm_from_utf8_string(_s) > # define AG_SCM_STR2SCM(_st,_sz) scm_from_utf8_stringn(_st,_sz) > #endif Just remember that this change implies that these macros should only be used for C string literals, and must _not_ be used for strings supplied by the user (e.g. command-line arguments and I/O). It could very well be that you're currently overloading these functions for both purposes, in which case you should split this pair of macros into two distinct pairs: one pair of macros for user strings (keep using scm_from_locale_string{,n} for these), and one pair for C string literals (use scm_from_utf8_string{,n} for Guile 2.0.0 or newer). Then look at each use of these old overloaded macros in your code, and figure out whether it's operating on a string that came from the user or a string that came from your own source code. Again, I stress that this has nothing to do with Guile. All software, if it wishes to be properly internationalized, needs to think about where a string came from. In general, your program's source code (and thus the C string literals it contains) will have a different encoding than C strings that come from the user. C strings of different encodings are essentially of different types (even though C's type system is too crude to distinguish them), and you must treat them as such. Mark