On Fri, Jun 24, 2016 at 11:08 AM, Igor Korot <ikoro...@gmail.com> wrote:

> > To answer your main question: Will a DB in SQLite produce the same
> > characters/encoding in Germany as in the US or China and can data be
> safely
> > sent from one to the other?:
> > The answer is: It should... if the Apps you use convert values correctly
> > between UTF-8 and local encodings (Steps: INPUT-TRANS-OUTPUT above), then
> > the SQLite API will do its part correctly (STORE-RETRIEVE) and you should
> > see the same things everywhere. if this isn't the case, then the Apps you
> > use are broken.
>
> OK, so all in all.
> What I gather from all your replies is that however I enter the data -
> table name, table fields -
> whether it will be with "ALT+num", or directly typing it on the
> keyboard, and independently
> on where the input is produced - US, Germany or China - querying such
> database will work.
> correctly around the world.
>
> Am I understanding this right?
>
> If yes, than going back to my original post:
>
> struct Table
> {
>     std::wstring name;
>     std::vector<Fields> fields;
>     std::vector<FKeys> foreign_keys;
> };
>
> std::wstring_convert<std::codecvt_utf8<wchar_t> > myconv;
> const unsigned char *tableName = sqlite3_column_text( stmt, 0 );
> pimpl->m_tables[m_catalog].push_back( Table( myconv.from_bytes( (const
> char *) tableName ), fields, foreign_keys ) );
>
> This code compiled with MSVC 2010 as C++11 snippet will crash
> on the "ALT+225" symbol inside myconv.from_bytes().
> And if my understanding above is correct - it shouldn't crash.
>
> So what can I do to fix it?
>
> Thank you.
>

On Windows, when you get a string of characters, you either get an ANSI
string using some code page, or you get a wide character string.

When you get an ANSI string, it is just a sequence of 8 bit bytes. UTF-8 is
also a sequence of 8 bit bytes. The meaning / encoding of those 8 bit bytes
are very different.

SQLite will allow you to write any 8 bit byte sequence you want as a
string. It does not attempt to validate the bytes. It will read the bytes
back exactly as written. So if you wrote an ANSI string to the database
instead of a UTF-8 string, you will get back the ANSI string.

This all assumes you're using the UTF-8 functions, which might be more
accurately described as byte functions. SQLite databases have an encoding.
They store either UTF-8 text or UTF-16 text. If your database is UTF-8 and
you use the char/byte based interface, SQLite won't interpret the bytes. If
your database is UTF-16 and you use the wide character based interface,
SQLite won't interpret the wide characters. It assumes you've given it
valid data and will use it as is. This is particularly convenient when
dealing with variant columns.

If, however, your database is UTF-8 and you use the UTF-16 interface
functions, SQLite will attempt to convert the data between UTF-8 & UTF-16.
If your database is UTF-16 and you use the UTF-8 interface functions,
SQLite will attempt to convert the data. In these cases, it is important to
have valid UTF-whatever in the database.

It looks to me like, in your case, some program wrote a byte sequence to
the database that was not UTF-8. You later read that string back out of the
database, and attempt to convert it to a wstring with your C++ code. The
byte sequence was not UTF-8, hence the failure.

I seem to recall a recent discussion on the list about the shell and
console input / output and it not being treated 100% accurately as
UTF-whatever. Library internals are, but the IO layer in the shell, not so
much.

Thus you cannot depend on the shell to translate non-ASCII characters on
Windows and write them as UTF-whatever. If using the shell is essential to
your process, you can't currently get there from here.

Though maybe ... instead of typing ALT+225, try typing ALT+195 ALT+159. In
your windows console, that would give you the equivalent byte sequence for
that character, compensating for the fact that SQLite doesn't (I believe)
transform console input to UTF-8. If I am mistaken on that point, I
apologize.

If the two alt-code byte sequences create data your C++ code can then
process (because it's valid UTF-8), you'll know for certain that the SQLite
shell on Windows does not process UTF-8 for console IO, just internally to
the database layer.

-- 
Scott Robison
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to