-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 William Kyngesburye wrote: > So, sqlite supports UTF8 directly - UTF8 in, UTF8 out.
No. SQLite supports Unicode internally. The APIs let you supply and receive Unicode strings in UTF8 and UTF16. The actual encoding serialized to disk depends on a number of factors, but is also irrelevant to API usage as the API will accept or supply then in the encoding you request (UTF8, UTF16). > And then, ICU adds internal unicode sorting, searching and case > conversion. The builtin SQLite sorting and case conversion only knows about US ascii and just leaves other codepoints alone for case conversion, or sorts by codepoint. ICU lets you specify the locale: # standard sqlite select upper("instant text") # with ICU select upper("instant text", "tr_TR") The former will give "INSTANT TEXT" while the latter gives "İNSTANT TEXT" (note dot on top of i) > The spatialite unicode support seems to be conversion routines to/from > UTF8 in the shell when the shell uses some other encoding. The shell appears to be UTF8 - it seems make no effort to do character set conversion. (It also has a number of escaping options such as CSV, HTML, C style backslashes). However it really only does codepoints less than 255. The various output routines treat the strings as a sequence of bytes and make output decisions on a byte by byte basis. This means for example that if there is a multibyte utf8 sequence the subsequent bytes will not be treated as part of a utf8 encoded codepoint. Basically not getting mangled output when using non-latin1 codepoints is a matter of luck. > I'm > more interested in the library. I'll have to play with it a bit to > see for sure. The library does unicode, only unicode, full stop end of story. The SQLite apis that take or supply text are usually in two variants. If there is a "16" suffix then UTF16 is used, else UTF8 is used. Use whichever variant is most convenient for you. The underlying behaviour is identical. In general Windows programmers will find the UTF16 variant more useful as the Windows API has been unicode since Windows NT(*). Linux programmers will find the UTF8 variant more useful since that is what other Linux apis tend to be. I have no idea about Mac. Note that there is no problem using both variants in the same program - I regularly do! (*) There are also legacy versions that take bytes in the local code page, and then usually convert to Unicode and call the Unicode version of the API. Windows 9x had an amusing library named unicows that did the opposite - it took calls to the Unicode system apis and converted to local code page and called them since the internals were not unicode. Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkkGnIsACgkQmOOfHg372QQz8ACeKaahVpynXD51yVJH2LXsHl++ P2YAoLpXceo492DgQmq2dgabCvL6XuHW =kgA7 -----END PGP SIGNATURE----- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users