Re: [sqlite] UTF-16 API a second class citizen?

2009-04-08 Thread Igor Tandetnik
"Vinnie"  wrote in
message news:621704.45397...@web58202.mail.re3.yahoo.com
> To be honest, thinking about character
> encodings gives me a large headache

"The Absolute Minimum Every Software Developer Absolutely, Positively 
Must Know About Unicode and Character Sets (No Excuses!)"
http://www.joelonsoftware.com/articles/Unicode.html

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF-16 API a second class citizen?

2009-04-07 Thread Scott Hess
On Tue, Apr 7, 2009 at 7:04 PM, Vinnie  wrote:
> On the other hand there is some legacy data that I want to store
> using UTF-8. For these fields I will use sqlite3_bind_text(). It is
> possible that in a single INSERT statement there are both UTF-16
> and UTF-8 (wchar_t and char) fields present.
>
> At no point am I ever constructing SQL statements using a printf()
> style conversion on field data to create the statement.
>
> Am I vulnerable to a performance penalty because of conversions in this 
> scenario?

There is a lot of debate as to whether you lose more in the expanded
storage needs of UTF-16 than you gain in not having to decode the
UTF-8, even for in-memory applications.  For databases, it is almost
always a win to use UTF-8, because you can decode a huge amount of
UTF-8 for the cost of a single seek (bigger databases means more
seeks).  Unless your data is in that subset which codes more space
efficiently in UTF-16 than UTF-8.

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF-16 API a second class citizen?

2009-04-07 Thread Nicolas Williams
On Tue, Apr 07, 2009 at 08:28:24PM -0700, Vinnie wrote:
> 
> > Note that both UTF-8 and UTF-16 are capable of representing
> > the full range of Unicode characters. Conversion between the two is
> > lossless. You seem to be under impression that UTF-8 is somehow
> > deficient, only suitable for "legacy" encoding. This is not the
> > case.
> 
> Yeah thats what they say...but if thats the case then why use UTF-16
> at all? What is the benefit for supporting UNICODE? Why is there

Because, for example, some systems supported UCS-2 to begin with, and
UTF-16 is the upgrade path from UCS-2, and as painless/painful as the
ASCII->UTF-8 upgrade path.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF-16 API a second class citizen?

2009-04-07 Thread Vinnie

> Note that both UTF-8 and UTF-16 are capable of representing
> the full range of Unicode characters. Conversion between the two is
> lossless. You seem to be under impression that UTF-8 is somehow
> deficient, only suitable for "legacy" encoding. This is not the
> case.

Yeah thats what they say...but if thats the case then why use UTF-16 at all? 
What is the benefit for supporting UNICODE? Why is there UTF-16 support in 
SQLite? To be honest, thinking about character encodings gives me a large 
headache even though I've been programming for decades. I figured that 
supporting wide characters will be somehow beneficial for international 
users...I hope I was not mistaken but it was not a small amount of extra work. 
Although I have (hopefully) written everything to work with narrow characters 
by flipping a switch.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF-16 API a second class citizen?

2009-04-07 Thread Igor Tandetnik
"Vinnie"  wrote in
message news:320060.55321...@web58204.mail.re3.yahoo.com
> However, I have table fields which will be UTF-16. For example,
> filenames that have to support international character sets. Or
> metadata fields that use different character sets (UNICODE). For
> these I am using sqlite3_bind_text16() and passing an appropriate
> wchar_t buffer.
>
> On the other hand there is some legacy data that I want to store
> using UTF-8. For these fields I will use sqlite3_bind_text(). It is
> possible that in a single INSERT statement there are both UTF-16 and
> UTF-8 (wchar_t and char) fields present.

Note that both UTF-8 and UTF-16 are capable of representing the full 
range of Unicode characters. Conversion between the two is lossless. You 
seem to be under impression that UTF-8 is somehow deficient, only 
suitable for "legacy" encoding. This is not the case.

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF-16 API a second class citizen?

2009-04-07 Thread Bill KING
Vinnie wrote:
> PRAGMA statements, I see what you mean now. This is exactly what I needed, 
> thanks a lot.
>
> To clarify what I am doing, my SQL statements are in UTF-8 and they are all 
> prepared, with parameter bindings. So table names, column names, etc.. are 
> all UTF-8.
>
> However, I have table fields which will be UTF-16. For example, filenames 
> that have to support international character sets. Or metadata fields that 
> use different character sets (UNICODE). For these I am using 
> sqlite3_bind_text16() and passing an appropriate wchar_t buffer.
>
> On the other hand there is some legacy data that I want to store using UTF-8. 
> For these fields I will use sqlite3_bind_text(). It is possible that in a 
> single INSERT statement there are both UTF-16 and UTF-8 (wchar_t and char) 
> fields present.
>
> At no point am I ever constructing SQL statements using a printf() style 
> conversion on field data to create the statement.
>
> Am I vulnerable to a performance penalty because of conversions in this 
> scenario?
>
> Thanks
>   
>From what I can see, i'd say no, binds are converted once (iirc) and
referenced from there on forwards, quoted strings are generally where
the performance hit comes in, as they're converted every step of the vm
(so for a select statement, every sqlite3_step over the result set).
>
>   
>> Igor Tandetnik wrote:
>> 
>>> You can mix and match encodings in your application.
>>>   
>> The database 
>> 
>>> encoding determines how strings are actually stored in
>>>   
>> the file (and 
>> 
>>> it's database-wide, not per table). SQLite API
>>>   
>> converts back and forth 
>> 
>>> as necessary.
>>>   
>>>   
>> Very inneficiently, but yes, it does. I suggest to the OP
>> to use
>> parameterised queries if you need to use string values,
>> otherwise,
>> you'll see significant overhead from conversions back
>> and forth between
>> utf8 and utf16 inside the sqlite code.
>> 
>>> Igor Tandetnik 
>>>   
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>   


-- 
Bill King, Software Engineer
Qt Software, Nokia Pty Ltd
Brisbane Office

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF-16 API a second class citizen?

2009-04-07 Thread Vinnie

PRAGMA statements, I see what you mean now. This is exactly what I needed, 
thanks a lot.

To clarify what I am doing, my SQL statements are in UTF-8 and they are all 
prepared, with parameter bindings. So table names, column names, etc.. are all 
UTF-8.

However, I have table fields which will be UTF-16. For example, filenames that 
have to support international character sets. Or metadata fields that use 
different character sets (UNICODE). For these I am using sqlite3_bind_text16() 
and passing an appropriate wchar_t buffer.

On the other hand there is some legacy data that I want to store using UTF-8. 
For these fields I will use sqlite3_bind_text(). It is possible that in a 
single INSERT statement there are both UTF-16 and UTF-8 (wchar_t and char) 
fields present.

At no point am I ever constructing SQL statements using a printf() style 
conversion on field data to create the statement.

Am I vulnerable to a performance penalty because of conversions in this 
scenario?

Thanks


> Igor Tandetnik wrote:
> > You can mix and match encodings in your application.
> The database 
> > encoding determines how strings are actually stored in
> the file (and 
> > it's database-wide, not per table). SQLite API
> converts back and forth 
> > as necessary.
> >   
> Very inneficiently, but yes, it does. I suggest to the OP
> to use
> parameterised queries if you need to use string values,
> otherwise,
> you'll see significant overhead from conversions back
> and forth between
> utf8 and utf16 inside the sqlite code.
> > Igor Tandetnik 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF-16 API a second class citizen?

2009-04-07 Thread Bill KING
Igor Tandetnik wrote:
> You can mix and match encodings in your application. The database 
> encoding determines how strings are actually stored in the file (and 
> it's database-wide, not per table). SQLite API converts back and forth 
> as necessary.
>   
Very inneficiently, but yes, it does. I suggest to the OP to use
parameterised queries if you need to use string values, otherwise,
you'll see significant overhead from conversions back and forth between
utf8 and utf16 inside the sqlite code.
> Igor Tandetnik 
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>   


-- 
Bill King, Software Engineer
Qt Software, Nokia Pty Ltd
Brisbane Office

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF-16 API a second class citizen?

2009-04-07 Thread Igor Tandetnik
Vinnie  wrote:
>> From: "Igor Tandetnik" 
>> You could convert your file name from UTF-16 to UTF-8, then
>> call sqlite3_open_v2.
>
> Converting the file name is no problem. But I thought that depending
> on how you opened the database (open16 versus open_v2), SQL treats
> your strings differently. I don't care about the encoding used to
> pass the filename, I care about the strings in my table rows.

You can specify encoding explicitly, with "PRAGMA encoding" statement, 
right after the database is created (or rather, right after you call 
sqlite3_open_* on a non-existent file and before you issue any other 
statement; the database is not actually created until the first 
statement needs to write to the file).

> Or does the encoding for the file name used to open the database not
> matter to subsequent SQLite SQL statements? Can I mix and match UTF-8
> and UTF-16 in a table or across multiple tables?

You can mix and match encodings in your application. The database 
encoding determines how strings are actually stored in the file (and 
it's database-wide, not per table). SQLite API converts back and forth 
as necessary.

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF-16 API a second class citizen?

2009-04-07 Thread Vinnie


> From: "Igor Tandetnik" 
> You could convert your file name from UTF-16 to UTF-8, then
> call sqlite3_open_v2.

Converting the file name is no problem. But I thought that depending on how you 
opened the database (open16 versus open_v2), SQL treats your strings 
differently. I don't care about the encoding used to pass the filename, I care 
about the strings in my table rows.

Or does the encoding for the file name used to open the database not matter to 
subsequent SQLite SQL statements? Can I mix and match UTF-8 and UTF-16 in a 
table or across multiple tables?

> See PRAGMA user_version
> (http://sqlite.org/pragma.html#version) - it's 
> designed specifically to do this sort of thing.

Yes I see, thank you very much. This is exactly what I am already trying to 
with my VERSION table.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF-16 API a second class citizen?

2009-04-07 Thread Eric Minbiole
> Dear Group:
> 
> When my application launches I want to open the associated database, and if 
> that fails because the file does not exist then I would create a new 
> database. 
> 
> sqlite3_open_v2() is ideal for this purpose because you can leave out 
> SQLITE_OPEN_CREATE flag, and specify SQLITE_OPEN_READWRITE.
> 
> Unfortunately, this is all academic because I am using sqlite3_open16()! 
> Where is the UTF-16 version that accepts the flags as a parameter? How can I 
> achieve the same functionality? Let me add that I am not too keen on 
> modifying sqlite.c so thats not an option (too much hassle when new versions 
> come out).
> 
> How did this oversight happen? And what is the workaround? How can I tell, 
> after a call to sqlite3_open16() if the database was created? The first thing 
> I do when it opens is SELECT * FROM VERSION to see if I have to upgrade the 
> database to a new version of my application data. I guess that call could 
> fail and that would be my clue to create all the tables.
> 
> But what if the SELECT fails for a different reason? How do I distinguish it? 
> How do I make this robust?
> 
> Thanks!

I would assume the intent is that you convert your UTF-16 filename to 
UTF-8, and then call sqlite3_open_v2().  I don't know what platform you 
are running on, but you may have some conversion APIs available to you. 
  If not, unicode.org provides some nice sample code that performs the 
conversion:

http://unicode.org/faq/utf_bom.html#UTF8

~Eric
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UTF-16 API a second class citizen?

2009-04-07 Thread Igor Tandetnik
Vinnie  wrote:
> Unfortunately, this is all academic because I am using
> sqlite3_open16()! Where is the UTF-16 version that accepts the flags
> as a parameter? How can I achieve the same functionality?

You could convert your file name from UTF-16 to UTF-8, then call 
sqlite3_open_v2.

> How did this oversight happen? And what is the workaround? How can I
> tell, after a call to sqlite3_open16() if the database was created?
> The first thing I do when it opens is SELECT * FROM VERSION to see if
> I have to upgrade the database to a new version of my application
> data. I guess that call could fail and that would be my clue to
> create all the tables.

See PRAGMA user_version (http://sqlite.org/pragma.html#version) - it's 
designed specifically to do this sort of thing. A freshly created 
database will have user_version of 0. You would normally set it to some 
positive number (internally meaningful to your application - SQLite 
doesn't care) after creating your tables.

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users