Re: [sqlite] UTF-16 API a second class citizen?
"Vinnie"wrote in message news:621704.45397...@web58202.mail.re3.yahoo.com > To be honest, thinking about character > encodings gives me a large headache "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" http://www.joelonsoftware.com/articles/Unicode.html Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UTF-16 API a second class citizen?
On Tue, Apr 7, 2009 at 7:04 PM, Vinniewrote: > On the other hand there is some legacy data that I want to store > using UTF-8. For these fields I will use sqlite3_bind_text(). It is > possible that in a single INSERT statement there are both UTF-16 > and UTF-8 (wchar_t and char) fields present. > > At no point am I ever constructing SQL statements using a printf() > style conversion on field data to create the statement. > > Am I vulnerable to a performance penalty because of conversions in this > scenario? There is a lot of debate as to whether you lose more in the expanded storage needs of UTF-16 than you gain in not having to decode the UTF-8, even for in-memory applications. For databases, it is almost always a win to use UTF-8, because you can decode a huge amount of UTF-8 for the cost of a single seek (bigger databases means more seeks). Unless your data is in that subset which codes more space efficiently in UTF-16 than UTF-8. -scott ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UTF-16 API a second class citizen?
On Tue, Apr 07, 2009 at 08:28:24PM -0700, Vinnie wrote: > > > Note that both UTF-8 and UTF-16 are capable of representing > > the full range of Unicode characters. Conversion between the two is > > lossless. You seem to be under impression that UTF-8 is somehow > > deficient, only suitable for "legacy" encoding. This is not the > > case. > > Yeah thats what they say...but if thats the case then why use UTF-16 > at all? What is the benefit for supporting UNICODE? Why is there Because, for example, some systems supported UCS-2 to begin with, and UTF-16 is the upgrade path from UCS-2, and as painless/painful as the ASCII->UTF-8 upgrade path. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UTF-16 API a second class citizen?
> Note that both UTF-8 and UTF-16 are capable of representing > the full range of Unicode characters. Conversion between the two is > lossless. You seem to be under impression that UTF-8 is somehow > deficient, only suitable for "legacy" encoding. This is not the > case. Yeah thats what they say...but if thats the case then why use UTF-16 at all? What is the benefit for supporting UNICODE? Why is there UTF-16 support in SQLite? To be honest, thinking about character encodings gives me a large headache even though I've been programming for decades. I figured that supporting wide characters will be somehow beneficial for international users...I hope I was not mistaken but it was not a small amount of extra work. Although I have (hopefully) written everything to work with narrow characters by flipping a switch. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UTF-16 API a second class citizen?
"Vinnie"wrote in message news:320060.55321...@web58204.mail.re3.yahoo.com > However, I have table fields which will be UTF-16. For example, > filenames that have to support international character sets. Or > metadata fields that use different character sets (UNICODE). For > these I am using sqlite3_bind_text16() and passing an appropriate > wchar_t buffer. > > On the other hand there is some legacy data that I want to store > using UTF-8. For these fields I will use sqlite3_bind_text(). It is > possible that in a single INSERT statement there are both UTF-16 and > UTF-8 (wchar_t and char) fields present. Note that both UTF-8 and UTF-16 are capable of representing the full range of Unicode characters. Conversion between the two is lossless. You seem to be under impression that UTF-8 is somehow deficient, only suitable for "legacy" encoding. This is not the case. Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UTF-16 API a second class citizen?
Vinnie wrote: > PRAGMA statements, I see what you mean now. This is exactly what I needed, > thanks a lot. > > To clarify what I am doing, my SQL statements are in UTF-8 and they are all > prepared, with parameter bindings. So table names, column names, etc.. are > all UTF-8. > > However, I have table fields which will be UTF-16. For example, filenames > that have to support international character sets. Or metadata fields that > use different character sets (UNICODE). For these I am using > sqlite3_bind_text16() and passing an appropriate wchar_t buffer. > > On the other hand there is some legacy data that I want to store using UTF-8. > For these fields I will use sqlite3_bind_text(). It is possible that in a > single INSERT statement there are both UTF-16 and UTF-8 (wchar_t and char) > fields present. > > At no point am I ever constructing SQL statements using a printf() style > conversion on field data to create the statement. > > Am I vulnerable to a performance penalty because of conversions in this > scenario? > > Thanks > >From what I can see, i'd say no, binds are converted once (iirc) and referenced from there on forwards, quoted strings are generally where the performance hit comes in, as they're converted every step of the vm (so for a select statement, every sqlite3_step over the result set). > > >> Igor Tandetnik wrote: >> >>> You can mix and match encodings in your application. >>> >> The database >> >>> encoding determines how strings are actually stored in >>> >> the file (and >> >>> it's database-wide, not per table). SQLite API >>> >> converts back and forth >> >>> as necessary. >>> >>> >> Very inneficiently, but yes, it does. I suggest to the OP >> to use >> parameterised queries if you need to use string values, >> otherwise, >> you'll see significant overhead from conversions back >> and forth between >> utf8 and utf16 inside the sqlite code. >> >>> Igor Tandetnik >>> > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > -- Bill King, Software Engineer Qt Software, Nokia Pty Ltd Brisbane Office ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UTF-16 API a second class citizen?
PRAGMA statements, I see what you mean now. This is exactly what I needed, thanks a lot. To clarify what I am doing, my SQL statements are in UTF-8 and they are all prepared, with parameter bindings. So table names, column names, etc.. are all UTF-8. However, I have table fields which will be UTF-16. For example, filenames that have to support international character sets. Or metadata fields that use different character sets (UNICODE). For these I am using sqlite3_bind_text16() and passing an appropriate wchar_t buffer. On the other hand there is some legacy data that I want to store using UTF-8. For these fields I will use sqlite3_bind_text(). It is possible that in a single INSERT statement there are both UTF-16 and UTF-8 (wchar_t and char) fields present. At no point am I ever constructing SQL statements using a printf() style conversion on field data to create the statement. Am I vulnerable to a performance penalty because of conversions in this scenario? Thanks > Igor Tandetnik wrote: > > You can mix and match encodings in your application. > The database > > encoding determines how strings are actually stored in > the file (and > > it's database-wide, not per table). SQLite API > converts back and forth > > as necessary. > > > Very inneficiently, but yes, it does. I suggest to the OP > to use > parameterised queries if you need to use string values, > otherwise, > you'll see significant overhead from conversions back > and forth between > utf8 and utf16 inside the sqlite code. > > Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UTF-16 API a second class citizen?
Igor Tandetnik wrote: > You can mix and match encodings in your application. The database > encoding determines how strings are actually stored in the file (and > it's database-wide, not per table). SQLite API converts back and forth > as necessary. > Very inneficiently, but yes, it does. I suggest to the OP to use parameterised queries if you need to use string values, otherwise, you'll see significant overhead from conversions back and forth between utf8 and utf16 inside the sqlite code. > Igor Tandetnik > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > -- Bill King, Software Engineer Qt Software, Nokia Pty Ltd Brisbane Office ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UTF-16 API a second class citizen?
Vinniewrote: >> From: "Igor Tandetnik" >> You could convert your file name from UTF-16 to UTF-8, then >> call sqlite3_open_v2. > > Converting the file name is no problem. But I thought that depending > on how you opened the database (open16 versus open_v2), SQL treats > your strings differently. I don't care about the encoding used to > pass the filename, I care about the strings in my table rows. You can specify encoding explicitly, with "PRAGMA encoding" statement, right after the database is created (or rather, right after you call sqlite3_open_* on a non-existent file and before you issue any other statement; the database is not actually created until the first statement needs to write to the file). > Or does the encoding for the file name used to open the database not > matter to subsequent SQLite SQL statements? Can I mix and match UTF-8 > and UTF-16 in a table or across multiple tables? You can mix and match encodings in your application. The database encoding determines how strings are actually stored in the file (and it's database-wide, not per table). SQLite API converts back and forth as necessary. Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UTF-16 API a second class citizen?
> From: "Igor Tandetnik"> You could convert your file name from UTF-16 to UTF-8, then > call sqlite3_open_v2. Converting the file name is no problem. But I thought that depending on how you opened the database (open16 versus open_v2), SQL treats your strings differently. I don't care about the encoding used to pass the filename, I care about the strings in my table rows. Or does the encoding for the file name used to open the database not matter to subsequent SQLite SQL statements? Can I mix and match UTF-8 and UTF-16 in a table or across multiple tables? > See PRAGMA user_version > (http://sqlite.org/pragma.html#version) - it's > designed specifically to do this sort of thing. Yes I see, thank you very much. This is exactly what I am already trying to with my VERSION table. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UTF-16 API a second class citizen?
> Dear Group: > > When my application launches I want to open the associated database, and if > that fails because the file does not exist then I would create a new > database. > > sqlite3_open_v2() is ideal for this purpose because you can leave out > SQLITE_OPEN_CREATE flag, and specify SQLITE_OPEN_READWRITE. > > Unfortunately, this is all academic because I am using sqlite3_open16()! > Where is the UTF-16 version that accepts the flags as a parameter? How can I > achieve the same functionality? Let me add that I am not too keen on > modifying sqlite.c so thats not an option (too much hassle when new versions > come out). > > How did this oversight happen? And what is the workaround? How can I tell, > after a call to sqlite3_open16() if the database was created? The first thing > I do when it opens is SELECT * FROM VERSION to see if I have to upgrade the > database to a new version of my application data. I guess that call could > fail and that would be my clue to create all the tables. > > But what if the SELECT fails for a different reason? How do I distinguish it? > How do I make this robust? > > Thanks! I would assume the intent is that you convert your UTF-16 filename to UTF-8, and then call sqlite3_open_v2(). I don't know what platform you are running on, but you may have some conversion APIs available to you. If not, unicode.org provides some nice sample code that performs the conversion: http://unicode.org/faq/utf_bom.html#UTF8 ~Eric ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UTF-16 API a second class citizen?
Vinniewrote: > Unfortunately, this is all academic because I am using > sqlite3_open16()! Where is the UTF-16 version that accepts the flags > as a parameter? How can I achieve the same functionality? You could convert your file name from UTF-16 to UTF-8, then call sqlite3_open_v2. > How did this oversight happen? And what is the workaround? How can I > tell, after a call to sqlite3_open16() if the database was created? > The first thing I do when it opens is SELECT * FROM VERSION to see if > I have to upgrade the database to a new version of my application > data. I guess that call could fail and that would be my clue to > create all the tables. See PRAGMA user_version (http://sqlite.org/pragma.html#version) - it's designed specifically to do this sort of thing. A freshly created database will have user_version of 0. You would normally set it to some positive number (internally meaningful to your application - SQLite doesn't care) after creating your tables. Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users