On 9/1/2011 10:24 AM, Mohit Sindhwani wrote:
I understand that the database could be either UTF-8 or UTF-16 - but that would apply to the full DB not to a single column, right?
Right.
If that is the case, would it not make the database larger if we had a lot of content that was originally ASCII?
UTF-8 is a superset of ASCII. Strings that consist entirely of 7-bit ASCII characters are represented exactly the same way in ASCII and in UTF-8. That's probably what you want to stick with.
On the other hand, the other language that we are storing seems to require 3 bytes in UTF-8. Given that, it would appear that using UTF-8 would be a better idea since it will store more "efficiently".
If you have lots of Chinese (or Japanese or Korean) text to store, then UTF-16 might be more compact. For these languages, one character takes three bytes in UTF-8 but only two in UTF-16. On the other hand, plain ASCII characters take one byte in UTF-8 but still two bytes in UTF-16. So if you have a mix of the two, the issue gets murky.
In addition, there are a few other questions: - FTS would work fine on both UTF-8 and UTF-16 databases, right?
I believe so, but I'm not very familiar with FTS.
- Can we attach two databases that have different encodings?
Yes. SQLite automatically converts between them as needed, in a transparent fashion.
- When using Wide Strings in Windows CE, is one encoding more preferable over the other to minimize conversions?
Native API in Windows uses UTF-16. You can request UTF-16 strings even from UTF-8 database - like I said, SQLite converts between them transparently. The cost of conversion is likely negligible compared to the other costs of maintaining a database. In fact, UTF-8 might win simply because it means less data to read from hard drive, even if it requires conversion. The only way to be sure is to test and measure.
I already have a database that has a couple of tables that are in UTF-8 - is there an easy way for me to build a database from this that is UTF-16?
Using sqlite3 command line utility, run .dump command on the old database. Create a new database. Use "PRAGMA encoding" to set it to UTF-16. Run .import command on it using the dump file from the old one.
-- Igor Tandetnik _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users