Re: [sqlite] UTF8 support?

2008-10-27 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

William Kyngesburye wrote:
> So, sqlite supports UTF8 directly - UTF8 in, UTF8 out.  

No.  SQLite supports Unicode internally.  The APIs let you supply and
receive Unicode strings in UTF8 and UTF16.  The actual encoding
serialized to disk depends on a number of factors, but is also
irrelevant to API usage as the API will accept or supply then in the
encoding you request (UTF8, UTF16).

> And then, ICU adds internal unicode sorting, searching and case  
> conversion.

The builtin SQLite sorting and case conversion only knows about US ascii
and just leaves other codepoints alone for case conversion, or sorts by
codepoint.  ICU lets you specify the locale:

  # standard sqlite
  select upper("instant text")
  # with ICU
  select upper("instant text", "tr_TR")

The former will give "INSTANT TEXT" while the latter gives "İNSTANT
TEXT" (note dot on top of i)

> The spatialite unicode support seems to be conversion routines to/from  
> UTF8 in the shell when the shell uses some other encoding.  

The shell appears to be UTF8 - it seems make no effort to do character
set conversion.  (It also has a number of escaping options such as CSV,
HTML, C style backslashes).  However it really only does codepoints less
than 255.  The various output routines treat the strings as a sequence
of bytes and make output decisions on a byte by byte basis.  This means
for example that if there is a multibyte utf8 sequence the subsequent
bytes will not be treated as part of a utf8 encoded codepoint.
Basically not getting mangled output when using non-latin1 codepoints is
a matter of luck.

> I'm  
> more interested in the library.  I'll have to play with it a bit to  
> see for sure.

The library does unicode, only unicode, full stop end of story.  The
SQLite apis that take or supply text are usually in two variants.  If
there is a "16" suffix then UTF16 is used, else UTF8 is used.  Use
whichever variant is most convenient for you.  The underlying behaviour
is identical.  In general Windows programmers will find the UTF16
variant more useful as the Windows API has been unicode since Windows
NT(*).  Linux programmers will find the UTF8 variant more useful since
that is what other Linux apis tend to be.  I have no idea about Mac.
Note that there is no problem using both variants in the same program -
I regularly do!

(*) There are also legacy versions that take bytes in the local code
page, and then usually convert to Unicode and call the Unicode version
of the API.  Windows 9x had an amusing library named unicows that did
the opposite - it took calls to the Unicode system apis and converted to
local code page and called them since the internals were not unicode.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkkGnIsACgkQmOOfHg372QQz8ACeKaahVpynXD51yVJH2LXsHl++
P2YAoLpXceo492DgQmq2dgabCvL6XuHW
=kgA7
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF8 support?

2008-10-27 Thread William Kyngesburye
On Oct 27, 2008, at 10:23 AM, MikeW wrote:

> William Kyngesburye <[EMAIL PROTECTED]> writes:
>
>>
>> Does SQlite support UTF8 directly?  Or is this what the ICU extension
>> is for?  Does the sqlite3 shell program support UTF8?
>>
>> There is this spatialite extension which includes a modified sqlite3
>> shell program that "implements full UNICODE support".  So I'm a  
>> little
>> confused.
>>
>> -
>> William Kyngesburye 
>>
>
> Search the newsgroup ... start here:
> http://thread.gmane.org/gmane.comp.db.sqlite.general/41826/focus=41843
>
> Regards,
> MikeW


So, sqlite supports UTF8 directly - UTF8 in, UTF8 out.  I suppose this  
applies to the shell program also?

And then, ICU adds internal unicode sorting, searching and case  
conversion.

The spatialite unicode support seems to be conversion routines to/from  
UTF8 in the shell when the shell uses some other encoding.  I guess  
this doesn't worry me since OSX defaults to UTF8 in the shell, and I'm  
more interested in the library.  I'll have to play with it a bit to  
see for sure.

-
William Kyngesburye 
http://www.kyngchaos.com/

"We are at war with them. Neither in hatred nor revenge and with no  
particular pleasure I shall kill every ___ I can until the war is  
over. That is my duty."

"Don't you even hate 'em?"

"What good would it do if I did? If all the many millions of people of  
the allied nations devoted an entire year exclusively to hating the  
 it wouldn't kill one ___ nor shorten the war one day."

 "And it might give 'em all stomach ulcers."

- Tarzan, on war

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF8 support?

2008-10-27 Thread MikeW
William Kyngesburye <[EMAIL PROTECTED]> writes:

> 
> Does SQlite support UTF8 directly?  Or is this what the ICU extension  
> is for?  Does the sqlite3 shell program support UTF8?
> 
> There is this spatialite extension which includes a modified sqlite3  
> shell program that "implements full UNICODE support".  So I'm a little  
> confused.
> 
> -
> William Kyngesburye 
>

Search the newsgroup ... start here:
http://thread.gmane.org/gmane.comp.db.sqlite.general/41826/focus=41843

Regards,
MikeW




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] UTF8 support?

2008-10-27 Thread William Kyngesburye
Does SQlite support UTF8 directly?  Or is this what the ICU extension  
is for?  Does the sqlite3 shell program support UTF8?

There is this spatialite extension which includes a modified sqlite3  
shell program that "implements full UNICODE support".  So I'm a little  
confused.

-
William Kyngesburye 
http://www.kyngchaos.com/

Earth: "Mostly harmless"

- revised entry in the HitchHiker's Guide to the Galaxy


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users