On Fri, Jun 24, 2016 at 12:03 PM, Scott Robison <sc...@casaderobison.com>
wrote:

> On Windows, when you get a string of characters, you either get an ANSI
> string using some code page, or you get a wide character string.
>
> When you get an ANSI string, it is just a sequence of 8 bit bytes. UTF-8
> is also a sequence of 8 bit bytes. The meaning / encoding of those 8 bit
> bytes are very different.
>
> SQLite will allow you to write any 8 bit byte sequence you want as a
> string. It does not attempt to validate the bytes. It will read the bytes
> back exactly as written. So if you wrote an ANSI string to the database
> instead of a UTF-8 string, you will get back the ANSI string.
>
> This all assumes you're using the UTF-8 functions, which might be more
> accurately described as byte functions. SQLite databases have an encoding.
> They store either UTF-8 text or UTF-16 text. If your database is UTF-8 and
> you use the char/byte based interface, SQLite won't interpret the bytes. If
> your database is UTF-16 and you use the wide character based interface,
> SQLite won't interpret the wide characters. It assumes you've given it
> valid data and will use it as is. This is particularly convenient when
> dealing with variant columns.
>
> If, however, your database is UTF-8 and you use the UTF-16 interface
> functions, SQLite will attempt to convert the data between UTF-8 & UTF-16.
> If your database is UTF-16 and you use the UTF-8 interface functions,
> SQLite will attempt to convert the data. In these cases, it is important to
> have valid UTF-whatever in the database.
>
> It looks to me like, in your case, some program wrote a byte sequence to
> the database that was not UTF-8. You later read that string back out of the
> database, and attempt to convert it to a wstring with your C++ code. The
> byte sequence was not UTF-8, hence the failure.
>
> I seem to recall a recent discussion on the list about the shell and
> console input / output and it not being treated 100% accurately as
> UTF-whatever. Library internals are, but the IO layer in the shell, not so
> much.
>
> Thus you cannot depend on the shell to translate non-ASCII characters on
> Windows and write them as UTF-whatever. If using the shell is essential to
> your process, you can't currently get there from here.
>
> Though maybe ... instead of typing ALT+225, try typing ALT+195 ALT+159. In
> your windows console, that would give you the equivalent byte sequence for
> that character, compensating for the fact that SQLite doesn't (I believe)
> transform console input to UTF-8. If I am mistaken on that point, I
> apologize.
>
> If the two alt-code byte sequences create data your C++ code can then
> process (because it's valid UTF-8), you'll know for certain that the SQLite
> shell on Windows does not process UTF-8 for console IO, just internally to
> the database layer.
>

Okay, rather than guessing, I just did a test from a Windows 10 command
prompt. I am getting appropriate UTF-8 sequences. Here is my experiment:

I opened a memory database and issued the following commands:

create table test(a text);
insert into test values('ß'),('▀'),('á'),('ß'); -- for the first value I
typed ALT+225, then ALT+223, then ALT+0225, then ALT+0223
select a, hex(a) from test;

Which resulted in four rows of output:

ß|C3A1
▀|C39F
á|C2A0
ß|C3A1

I'm hoping all these extended characters are handled properly by gmail and
whatever email program you use.

Windows supports legacy ALT+### codes that map to the legacy code page. It
also supports ALT+0### which map to Unicode code points. This allows people
who're accustomed to the ALT+### format to still see the character they
expect, but translated to the equivalent Unicode code point.

Again, this is with Windows 10. Perhaps you could try a similar sequence to
what I typed above on your SQLite shell and Windows command prompt version
and see what you get back.

-- 
Scott Robison
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to