On 9/1/2011 10:24 AM, Mohit Sindhwani wrote:
I understand that the database could be either UTF-8 or UTF-16 - but
that would apply to the full DB not to a single column, right?

Right.

If that
is the case, would it not make the database larger if we had a lot of
content that was originally ASCII?

UTF-8 is a superset of ASCII. Strings that consist entirely of 7-bit ASCII characters are represented exactly the same way in ASCII and in UTF-8. That's probably what you want to stick with.

On the other hand, the other language that we are storing seems to
require 3 bytes in UTF-8. Given that, it would appear that using UTF-8
would be a better idea since it will store more "efficiently".

If you have lots of Chinese (or Japanese or Korean) text to store, then UTF-16 might be more compact. For these languages, one character takes three bytes in UTF-8 but only two in UTF-16. On the other hand, plain ASCII characters take one byte in UTF-8 but still two bytes in UTF-16. So if you have a mix of the two, the issue gets murky.

In addition, there are a few other questions:
- FTS would work fine on both UTF-8 and UTF-16 databases, right?

I believe so, but I'm not very familiar with FTS.

- Can we attach two databases that have different encodings?

Yes. SQLite automatically converts between them as needed, in a transparent fashion.

- When using Wide Strings in Windows CE, is one encoding more preferable
over the other to minimize conversions?

Native API in Windows uses UTF-16. You can request UTF-16 strings even from UTF-8 database - like I said, SQLite converts between them transparently. The cost of conversion is likely negligible compared to the other costs of maintaining a database. In fact, UTF-8 might win simply because it means less data to read from hard drive, even if it requires conversion. The only way to be sure is to test and measure.

I already have a database that has a couple of tables that are in UTF-8
- is there an easy way for me to build a database from this that is UTF-16?

Using sqlite3 command line utility, run .dump command on the old database. Create a new database. Use "PRAGMA encoding" to set it to UTF-16. Run .import command on it using the dump file from the old one.
--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to