Jean-Christophe:

Thanks for your advice -very helpful and ilustrative for my-.

Fortunately I'm not in the same horror history thank to the people in this 
list, and because I use the old and simple way: say plain SQLite C 
interface; plain Windows32 C API and plain Visual Cpp (Multibyte build in 
the existing versions, and Unicode build for the new international release 
of the app).

Despite of that, I'm aware that I have some more that pure US-ASCII in the 
blob objects, in fact I'm near your situation because used the Spanish 
languaje and have 8-bit extended ASCII with some special 
characters -accented characters and so-.

So the question is Yes, I have upper-ANSI data stored, and need convert it 
to MS VCpp w_char strings to rebuild the dBase. In this point I plain use 
mbstowcs() to do the thing.

Because there are some users in the field, the new app can detect if the 
dBase correspond to a previous version and rebuild it to the new version if 
needed. So I can be almost sure that the user use the same system code page 
that the one used when the data was
originally inserted.

Do is there some weird in my tought?

A.J. Millan


----- Original Message ----- 
From: "Jean-Christophe Deschamps" <j...@q-e-d.org>
To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org>
Sent: Thursday, October 29, 2009 3:04 PM
Subject: Re: [sqlite] Some clarification needed about Unicode


Hi,


Please, follow Igor advices, he is right.


>[1] Read the actual textual data with sqlite3_column_blob()

Which you can directly convert to TEXT if, as you say, you entered only
7-bit ASCII or UTF-8 compliant data.

>[2] Assuming the system code page matches the one used when the data was
>originally inserted, convert with mbstowcs()

Forget that.

>[3] (Doubt) The result can be directly written with
>sqlite3_bind_text() -I
>want store in UTF-8-

Why not convert the column(s) directly from blob to TEXT?

>OR must I write the result with sqlite3_bind_text16()? Them, the data is
>stored as UTF-16? or as UTF-8?

Forget this too.

>[4] Afterward once converted the dBase

I take his to mean "converts your blobs to TEXT" : yes, this is the
_only_ step you need for the whole thing.

>  and in regular use:
>
>[4-1a] Read with sqlite3_column_text()
>
>[4-1b] convert with WideCharToMultiByte(CP_UTF8)

Why?  Read it off as text16 direct into you cpp string , there must
exist zillions [working] wrappers to VC++.

>[4-1c] Use the result with Win32 api -SetTex()-

No particular API.  Try a simple MsgBox to see by yourself, or even a
basic "Héllô wörld!" as a Windows _console_ (printf).

>OR?
>
>[4-2a] Read with sqlite3_column_text16()
>[4-2b] No convertion needed.
>[4-2c] Use the result ...

YES !!!

The only catch would be if you have (knowingly or not) entered 8-bit
data (the upper 128 characters in the Win codepage) encoded in __ANSI__
(1 char = 1 byte) and stored in your blobs.  In such case, the
representation of the characters is not the expected UTF-8.

As far as I understand it, current SQLite _should_ read/write
non-compliant UTF-8 without problem, but I didn't checked that latest
versions are still byte-neutral.  But it is not the right lane.  If you
have upper_ANSI strings, convert them to UTF-*, where * is your
encoding choice for the new database.

<horror story>
I've spend indecent time sorting out all these questions, just because
someone decided to silently convert UTF-16 native Windows stings to
ANSI for every SQLite UTF-8 interface in the SQLite wrapper built into
the development tool I use.

It was very hard for me to figure this out, because a call similar to
printf _also_ converted to ANSI (silently) and a call used to display a
2D table _also_ converted to ANSI (silently). So sometimes I had French
and other European diacritics converted (i.e. completely destroyed)
like this: UTF-16 --> ANSI --> UTF-16 --> ANSI.

It was just like if I tried to read original plain text by just looking
at its MD5.  It's been a _real_ nightmare and I had data from 15
countries, some using "weird" (to me) scripts.
</horror story>


But you're not even close to this terrible situation.

Just determine if you have upper-ANSI data stored and convert it if needed.

Well, stay tuned, I'll do something for you, just allow me some time...

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to