Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-21 Thread Winfried [via SQLite]
Thanks for the infos.




__
If you reply to this email, your message will be added to the discussion below:
http://sqlite.1065341.n5.nabble.com/How-to-search-for-fields-with-accents-in-UTF-8-data-tp96249p96294.html
This email was sent by Winfried (via Nabble)
To receive all replies by email, subscribe to this discussion: 
http://sqlite.1065341.n5.nabble.com/template/NamlServlet.jtp?macro=subscribe_by_code=96249=c3FsaXRlLXVzZXJzQG1haWxpbmdsaXN0cy5zcWxpdGUub3JnfDk2MjQ5fC0xNDUwNjI0MDQ5
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-20 Thread Olivier Mascia
> Le 20 juin 2017 à 20:04, Scott Robison  a écrit :
> 
> Regardless, I prefer UTF-8 to UCS-2 / UTF-16. Microsoft has certainly
> had time to make their interfaces more UTF-8 friendly. I just don't
> think they get enough credit for committing to Unicode in an era when
> few were.

We are in full agreement, here. :)

-- 
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-20 Thread Scott Robison
On Tue, Jun 20, 2017 at 8:17 AM, Olivier Mascia  wrote:
>> Le 20 juin 2017 à 15:24, R Smith  a écrit :
>>
>> As an aside - I never understood the reasons for that. I get that Windows 
>> has a less "techy" clientèle than Linux for instance, and that the backwards 
>> compatibility is paramount, and that no console command ever need fall 
>> outside the 7-bit ANSI range of characters... but geez, how much effort can 
>> it be to make it Unicode-friendly? It's not like the Windows API lacks any 
>> Unicode functionality - even Notepad can handle it masterfully.
>
> I wouldn't like looking like I'm trolling this subject, but this is only a 
> matter of I/O functions used by programs built to interact with the display 
> and keyboard when run in a console. Windows needs those programs to use 
> ReadConsoleW/WriteConsoleW to do the proper thing.  Those programs using C 
> library to read or output byte streams can't do anything equivalent no matter 
> what 'codepage' is set to be used or to/from what DBCS the program attempts 
> conversion to/from.
>
> I learned this postulat here last year and have had excellent success with 
> console I/O ever since in my programmings.

About a year ago I had to write an emergency fixup tool for my
employer because of a backward breaking change at Microsoft that was
almost certainly due to a breakdown in revision control. The tool
needed to be localized, but it was sufficiently simple that a console
mode executable was sufficient. I had to jump through hoops to make it
work, but (by way of confirmation), the problems were in the CRT, not
the Win32 API. It was able to write and read Unicode ...

> To be complete, regarding proper display of the output, there is a secondary 
> consideration. The fonts available in Windows are far from covering a large 
> subset of the glyphs.  For eastern languages on a western Windows edition, 
> you generally need to change your console settings to make it use another 
> font than the default one, just so that it can draw the needed glyphs.  But 
> the basic thing to do is get the program running in the console (here we are 
> talking shell.c - sqlite3.exe) to output Windows wide-chars using the 
> function WriteConsoleW(). And use ReadConsoleW() to read wide-chars chunks 
> from the console input, before converting internally to UTF-8 or whatever 
> wanted.

... assuming of course that the locale was using a font that supported
the character set for that area. This was true for our purposes by
default, as we weren't expecting English speaking customers to need to
see Asian languages.

> Sqlite3 shell.c when patched that way is as pleasant to use on Windows 
> console as it can be on a modern Linux or macOS.
>
> Input files feeded to sqlite3.exe need to be in UTF-8, as well as output sent 
> by sqlite3.exe will be: that part is perfectly OK today in sqlite3.exe. Only 
> the keyboard reading and console output writing lacks a little.

Agreed.

>> but geez, how much effort can it be to make it Unicode-friendly?
>
> To further comment on a more general plane than the sqlite3.exe, the issue is 
> deeper in Windows than in its console. Once upon a time (!), they made the 
> choice of 16 bits per characters encoding as the *right* way (their right 
> way!) to do Unicode. It took time for this to evolve, recognizing the need 
> for multi-16 bits words encoding (UTF-16), so they could have chosen UTF-8 
> from day one, but that was not what history recorded. Later UTF-8 got *some* 
> support in the OS (through conversion functions). But never UTF-8 was raised 
> to full citizenship.  There is even a CHCP 65001 to set the 'codepage' to 
> UTF-8. It works partly in some circumstances, but is far from being 'right'. 
> No matter what you would do, there is no way for any file I/O primitive of 
> the OS to take an UTF-8 string as a filename. And this extend to the 
> C-library on Windows platform. The only unicode support is to pass a UTF-16 
> filename through functions ending with a W in the name. Those 'ansi' 
> functions, ending with an A in the name, are merely wrappers converting to 
> the wide chars versions.  There have been numerous requests to Microsoft to 
> let people and developers set the ANSI codepage to UTF-8 so that the file I/O 
> functions taking a narrow char filename string can interpret it as UTF-8. 
> Some are still waiting for that day to come, others use the W-side of things, 
> complicating portability of their codebase. :)

Windows NT was released in 1993. It had been in development for years.
It decided Unicode for I18N/L10N/W6R (WhateveR) purposes was better
than a ton of different code pages. At the point Microsoft committed
to Unicode, it was a two byte / sixteen bit encoding. There was no
UTF-8. There was no UTF-16. Other than endian issues, there was
nothing to worry about. Win32 was an "all new" API.

POSIX people didn't want to re-write the entire API to support 16 bit
characters, 

Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-20 Thread Keith Medcalf
On Tuesday, 20 June, 2017 07:24, R Smith  wrote:

> Every lesson is valuable!  Just to be clear - there is nothing wrong
> with using the CLI. When pointing it to a file that is correctly encoded
> the import must work correctly (if not, it's a bug) - It's just
> difficult to enter weird and wonderful Unicode characters outside the
> BMP basic Latin plane (the first 127 code-points) via the console, or do
> queries using them, all because the Windows console specifically is not
> Unicode-enabled.

The Windows Console is fully UNICODE enabled and always has been, ever since 
the beginning of time.  It does not, however, support any character encoding 
other than the proprietary Microsoft DBCS/MBCS, and then those only when one 
uses specific console I/O -- just like you have to use specific I/O in a gooey 
application -- streams do not have the code necessary to perform anything other 
than ANSI Code Page I/O.  Specifically you CANNOT send any encoded anything to 
a stdio stream and expect it to get decoded.  What you send (and what you 
receive from the keyboard) is ANSI only.

This is the design of Windows.
 
> As an aside - I never understood the reasons for that. I get that
> Windows has a less "techy" clientèle than Linux for instance, and that
> the backwards compatibility is paramount, and that no console command
> ever need fall outside the 7-bit ANSI range of characters... but geez,
> how much effort can it be to make it Unicode-friendly? It's not like the
> Windows API lacks any Unicode functionality - even Notepad can handle it
> masterfully.

It is unicode friendly, and unicode is fully supported.  
The I/O streams do not support encoded byte streams is all.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-20 Thread Olivier Mascia
> Le 20 juin 2017 à 15:56, Simon Slavin  a écrit :
> 
> The console you see is pretty-much the one which was in Windows 3.1.  It does 
> not use the modern API written post-unicode, it calls the old 
> single-character Windows routines which are still in Windows so old programs 
> don’t suddenly stop working.  It has numerous parts which assume
> 
> one keypress == one character == one octet == one space on the display

The console I/O can be used to enter (keyboard) not-so-basic unicode codepoints 
and the output can be sent pretty much anything you want, as long as the font 
selected in the console has been built with enough support for the eventually 
exotic glyphs you want to output.  That's the application you run in the 
console that has to use the right APIs for the job 
(ReadConsoleW/WriteConsoleW).  The support of these goes back to Windows 2000.

With some modified sqlite3.exe using those, here is what I can type 
interactively at the console:

SQLite version 3.19.3 2017-06-08 14:26:16
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table é€(a text);
sqlite> insert into é€ values('ñ');
sqlite> select * from é€;
ñ
sqlite> .once a.txt
sqlite> .dump
sqlite> .q

Here is the content of the file as seen on console when doing "type a.txt":

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE ├®Ôé¼(a text);
INSERT INTO "├®Ôé¼" VALUES('├▒');
COMMIT;

You can see the UTF-8 output sent to the dump file is correct - open it with 
notepad for instance. Side note: the system command "type" is not able to 
display it unless you do a CHCP 65001. It just proves I could type characters 
on the keyboard input which do not follow the concept 'one keypress - one 
character - one octet - one space on the display'.

-- 
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-20 Thread Olivier Mascia
> Le 20 juin 2017 à 15:24, R Smith  a écrit :
> 
> As an aside - I never understood the reasons for that. I get that Windows has 
> a less "techy" clientèle than Linux for instance, and that the backwards 
> compatibility is paramount, and that no console command ever need fall 
> outside the 7-bit ANSI range of characters... but geez, how much effort can 
> it be to make it Unicode-friendly? It's not like the Windows API lacks any 
> Unicode functionality - even Notepad can handle it masterfully.

I wouldn't like looking like I'm trolling this subject, but this is only a 
matter of I/O functions used by programs built to interact with the display and 
keyboard when run in a console. Windows needs those programs to use 
ReadConsoleW/WriteConsoleW to do the proper thing.  Those programs using C 
library to read or output byte streams can't do anything equivalent no matter 
what 'codepage' is set to be used or to/from what DBCS the program attempts 
conversion to/from.

I learned this postulat here last year and have had excellent success with 
console I/O ever since in my programmings.

To be complete, regarding proper display of the output, there is a secondary 
consideration. The fonts available in Windows are far from covering a large 
subset of the glyphs.  For eastern languages on a western Windows edition, you 
generally need to change your console settings to make it use another font than 
the default one, just so that it can draw the needed glyphs.  But the basic 
thing to do is get the program running in the console (here we are talking 
shell.c - sqlite3.exe) to output Windows wide-chars using the function 
WriteConsoleW(). And use ReadConsoleW() to read wide-chars chunks from the 
console input, before converting internally to UTF-8 or whatever wanted.

Sqlite3 shell.c when patched that way is as pleasant to use on Windows console 
as it can be on a modern Linux or macOS.

Input files feeded to sqlite3.exe need to be in UTF-8, as well as output sent 
by sqlite3.exe will be: that part is perfectly OK today in sqlite3.exe. Only 
the keyboard reading and console output writing lacks a little.

> but geez, how much effort can it be to make it Unicode-friendly?

To further comment on a more general plane than the sqlite3.exe, the issue is 
deeper in Windows than in its console. Once upon a time (!), they made the 
choice of 16 bits per characters encoding as the *right* way (their right way!) 
to do Unicode. It took time for this to evolve, recognizing the need for 
multi-16 bits words encoding (UTF-16), so they could have chosen UTF-8 from day 
one, but that was not what history recorded. Later UTF-8 got *some* support in 
the OS (through conversion functions). But never UTF-8 was raised to full 
citizenship.  There is even a CHCP 65001 to set the 'codepage' to UTF-8. It 
works partly in some circumstances, but is far from being 'right'. No matter 
what you would do, there is no way for any file I/O primitive of the OS to take 
an UTF-8 string as a filename. And this extend to the C-library on Windows 
platform. The only unicode support is to pass a UTF-16 filename through 
functions ending with a W in the name. Those 'ansi' functions, ending with an A 
in the name, are merely wrappers converting to the wide chars versions.  There 
have been numerous requests to Microsoft to let people and developers set the 
ANSI codepage to UTF-8 so that the file I/O functions taking a narrow char 
filename string can interpret it as UTF-8. Some are still waiting for that day 
to come, others use the W-side of things, complicating portability of their 
codebase. :)


-- 
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-20 Thread Gilles
Simon Slavin-3  > To clarify the clarification, you can use the SQLite 
shell tool just fine as long as you use it to process files, rather than 
expect characters which are entered through the keyboard or shown on the 
display to work.  So use ".read" or ".output" or ".once", and then use a 
non-console program to view the results.  Don’t type your text and view 
the results on the display.


Looks like using an alternative shell solves the problem:

http://conemu.github.io/

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-20 Thread Simon Slavin


On 20 Jun 2017, at 2:24pm, R Smith  wrote:

> Every lesson is valuable!  Just to be clear - there is nothing wrong with 
> using the CLI. When pointing it to a file that is correctly encoded the 
> import must work correctly (if not, it's a bug) - It's just difficult to 
> enter weird and wonderful Unicode characters outside the BMP basic Latin 
> plane (the first 127 code-points) via the console, or do queries using them, 
> all because the Windows console specifically is not Unicode-enabled.

To clarify the clarification, you can use the SQLite shell tool just fine as 
long as you use it to process files, rather than expect characters which are 
entered through the keyboard or shown on the display to work.  So use ".read" 
or ".output" or ".once", and then use a non-console program to view the 
results.  Don’t type your text and view the results on the display.

I don’t know the technical details of how windows does piping within the 
console.  It may or may not work to use command lines with ">" or "|" in.

> As an aside - I never understood the reasons for that. I get that Windows has 
> a less "techy" clientèle than Linux for instance, and that the backwards 
> compatibility is paramount, and that no console command ever need fall 
> outside the 7-bit ANSI range of characters... but geez, how much effort can 
> it be to make it Unicode-friendly? It's not like the Windows API lacks any 
> Unicode functionality - even Notepad can handle it masterfully.

The console you see is pretty-much the one which was in Windows 3.1.  It does 
not use the modern API written post-unicode, it calls the old single-character 
Windows routines which are still in Windows so old programs don’t suddenly stop 
working.  It has numerous parts which assume

one keypress == one character == one octet == one space on the display

these assumptions are not only in the code for the console itself but in the 
Windows routines it calls to do the work.  Rewriting the console to use the 
newer API calls, and also deal with the above assumptions not being true would 
be so major that Microsoft might as well start again from scratch.  And I’d 
expect the resulting code to be two or three times the size, and to be slower 
in execution.

This affects Powershell too, since Powershell runs inside the console.  But 
Powershell might do piping correctly, or be an improvement on the original 
shell in some other way.  I don’t use Windows at work so I don’t know.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-20 Thread Gilles

R Smith >

>> 2. Do not use the sqlite3.exe CLI if the data use anything more than
>> the basic latin alphabet. Instead, use a GUI application (eg. for
>> Windows, SQLite Studio, SQLitespeed, etc.)

> Every lesson is valuable!  Just to be clear - there is nothing wrong 
with using the CLI. When pointing it to a file that is correctly encoded 
the import must work correctly (if not, it's a bug) - It's just 
difficult to enter weird and wonderful Unicode characters outside the 
BMP basic Latin plane (the first 127 code-points) via the console, or do
queries using them, all because the Windows console specifically is not 
Unicode-enabled.


Yes, I should have been more precise: Using the CLI for importing data 
works fine; It's when typing accented characers that it fails.


> As an aside - I never understood the reasons for that.

Beats me. Maybe there is some legacy code somewhere deep in Windows' 
bowels that explains why the console (cmd.exe) isn't yet Unicode-capable.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-20 Thread J. King
Indeed. Technically-minded Windows users do exist (Hi, Microsoft, I'm right 
here!), and I have neither the time nor the inclination to learn PowerShell 
when the Windows terminal is already adequate---with a set of ports of GNU 
tools, anyway. :)

On June 20, 2017 9:24:12 AM EDT, R Smith  wrote:
>
>On 2017/06/20 2:34 PM, Gilles wrote:
>>
>> Lessons I learned:
>>
>> 1. In CSV files, double-check how data are encoded
>>
>> 2. Do not use the sqlite3.exe CLI if the data use anything more than 
>> the basic latin alphabet. Instead, use a GUI application (eg. for 
>> Windows, SQLite Studio, SQLitespeed, etc.)
>
>Every lesson is valuable!  Just to be clear - there is nothing wrong 
>with using the CLI. When pointing it to a file that is correctly
>encoded 
>the import must work correctly (if not, it's a bug) - It's just 
>difficult to enter weird and wonderful Unicode characters outside the 
>BMP basic Latin plane (the first 127 code-points) via the console, or
>do 
>queries using them, all because the Windows console specifically is not
>
>Unicode-enabled.
>
>As an aside - I never understood the reasons for that. I get that 
>Windows has a less "techy" clientèle than Linux for instance, and that 
>the backwards compatibility is paramount, and that no console command 
>ever need fall outside the 7-bit ANSI range of characters... but geez, 
>how much effort can it be to make it Unicode-friendly? It's not like
>the 
>Windows API lacks any Unicode functionality - even Notepad can handle
>it 
>masterfully.
>
>
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-20 Thread R Smith


On 2017/06/20 2:34 PM, Gilles wrote:


Lessons I learned:

1. In CSV files, double-check how data are encoded

2. Do not use the sqlite3.exe CLI if the data use anything more than 
the basic latin alphabet. Instead, use a GUI application (eg. for 
Windows, SQLite Studio, SQLitespeed, etc.)


Every lesson is valuable!  Just to be clear - there is nothing wrong 
with using the CLI. When pointing it to a file that is correctly encoded 
the import must work correctly (if not, it's a bug) - It's just 
difficult to enter weird and wonderful Unicode characters outside the 
BMP basic Latin plane (the first 127 code-points) via the console, or do 
queries using them, all because the Windows console specifically is not 
Unicode-enabled.


As an aside - I never understood the reasons for that. I get that 
Windows has a less "techy" clientèle than Linux for instance, and that 
the backwards compatibility is paramount, and that no console command 
ever need fall outside the 7-bit ANSI range of characters... but geez, 
how much effort can it be to make it Unicode-friendly? It's not like the 
Windows API lacks any Unicode functionality - even Notepad can handle it 
masterfully.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-20 Thread Gilles
Posted by David Raymond on Jun 19, 2017; 5:22pm > /The Windows command 
prompt and unicode have always not played well with each other. SQLite 
itself works perfectly with data on disk or in the database, there are 
just translation and display problems when going to and from the command 
prompt. /


Thanks much for the infos.

Lessons I learned:

1. In CSV files, double-check how data are encoded

2. Do not use the sqlite3.exe CLI if the data use anything more than the 
basic latin alphabet. Instead, use a GUI application (eg. for Windows, 
SQLite Studio, SQLitespeed, etc.)


Thank you all.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-19 Thread David Raymond
The Windows command prompt and unicode have always not played well with each 
other. SQLite itself works perfectly with data on disk or in the database, 
there are just translation and display problems when going to and from the 
command prompt.

If you write out your query in, say, Notepad++ and save it in UTF-8, then you 
can do ".read queryFile.txt" from the CLI and be sure that it's reading it ok. 
(Assuming of course your DB isn't using one of the UTF-16 options) The output 
may still look weird if it would include accented characters, but anything like 
count(*) or unicode(something) that return numbers, or anything that's ASCII 
will always look ok.


foo.txt: Saved in UTF-8

.bail on
.echo on
create table if not exists foo (foo text collate nocase);
insert or ignore into foo values ('Île-de-France');
select * from foo;
select char(206), unicode('Î');
select count(*) from foo where foo = 'Île-de-France';

end foo.txt



D:\Temp>sqlite3
SQLite version 3.19.3 2017-06-08 14:26:16
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> .read foo.txt
create table if not exists foo (foo text collate nocase);
Run Time: real 0.001 user 0.00 sys 0.015600
insert or ignore into foo values ('Île-de-France');
Run Time: real 0.000 user 0.00 sys 0.00
select * from foo;
--EQP-- 0,0,0,SCAN TABLE foo
foo
Île-de-France
Run Time: real 0.001 user 0.00 sys 0.00
select char(206), unicode('Î');
char(206)|unicode('Î')
Î|206
Run Time: real 0.000 user 0.00 sys 0.00
select count(*) from foo where foo = 'Île-de-France';
--EQP-- 0,0,0,SCAN TABLE foo
count(*)
1
Run Time: real 0.000 user 0.00 sys 0.00

sqlite>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-19 Thread Olivier Mascia
> Le 19 juin 2017 à 15:20, Simon Slavin  a écrit :
> 
> On 19 Jun 2017, at 11:13am, Gilles  wrote:
> 
>> It looks running sqlite3.exe in a terminal window (CMD) in Windows 7 doesn't 
>> work: Apparently, it doesn't support UTF-8.
> 
> Correct.  And the "it" that doesn’t support UTF-8 is the Windows console.  
> SQLite works fine and handles everything as Unicode internally.  The Windows 
> console won’t process multibyte characters internally and can’t display them 
> correctly.
> 
> 
> 
> "many legacy applications continue to use character sets based on code pages. 
> Even new applications sometimes have to work with code pages, often for one 
> of the following reasons: […]
>   • To communicate with the Windows Console, which does not support 
> Unicode."
> 
> 
> Some people have found ways to hack around this, but they simulate compliance 
> for a certain codepage rather than implement UTF-8 globally.
> 
> 
> 
> "The Windows Console doesn’t support Unicode."
> 
> 
> 
> Simon.

Switch the console I/O (windows only of course) of sqlite3 shell.c to use 
WriteConsoleW and ReadConsoleW, and there you go and forget about CHCP, 
codepages... Learned it the hard way last year, well after at some point I 
though DBCS would be enough. No.

-- 
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia, http://integral.software


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-19 Thread Hick Gunter
CP1252 = Windows-1252 = ISO 8859-1 aka Latin-1, an extension of ASCII

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Gilles
Gesendet: Montag, 19. Juni 2017 16:23
An: SQLite Maillist 
Betreff: Re: [sqlite] How to search for fields with accents in UTF-8 data?

Found the problem: Turns out the CSV file isn't in UTF8 but in CP1252 :-/

Icon.exe can be used to convert a file before importing it in SQLite.
https://dbaportal.eu/2012/10/24/iconv-for-windows/

Thanks everyone for the help.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-19 Thread Gilles

Found the problem: Turns out the CSV file isn't in UTF8 but in CP1252 :-/

Icon.exe can be used to convert a file before importing it in SQLite.
https://dbaportal.eu/2012/10/24/iconv-for-windows/

Thanks everyone for the help.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-19 Thread Simon Slavin


On 19 Jun 2017, at 11:13am, Gilles  wrote:

> It looks running sqlite3.exe in a terminal window (CMD) in Windows 7 doesn't 
> work: Apparently, it doesn't support UTF-8.

Correct.  And the "it" that doesn’t support UTF-8 is the Windows console.  
SQLite works fine and handles everything as Unicode internally.  The Windows 
console won’t process multibyte characters internally and can’t display them 
correctly.



"many legacy applications continue to use character sets based on code pages. 
Even new applications sometimes have to work with code pages, often for one of 
the following reasons: […]
• To communicate with the Windows Console, which does not support 
Unicode."


Some people have found ways to hack around this, but they simulate compliance 
for a certain codepage rather than implement UTF-8 globally.



"The Windows Console doesn’t support Unicode."



Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-19 Thread Gilles

Thanks everyone.

It looks running sqlite3.exe in a terminal window (CMD) in Windows 7 
doesn't work: Apparently, it doesn't support UTF-8.


And when using DB Browser for SQLite, it does work only if I copy/paste 
the output with the "?" where an accented character lives:

https://s15.postimg.org/e05v2q09n/SQLite.UTF8.accents.query.DB.Browser.for.SQLite.png

Could the problem be with fonts not supporting UTF8?

I'd rather not mess with Windows encoding, especially since one of the 
answers in SuperUser says that Windows only partially support Unicode.


Here's the output of the commands:

sqlite> SELECT unicode(Libreg) FROM MyTable WHERE Libreg LIKE 
'%le-de-France' LIMIT 1;

65533
sqlite> SELECT char(206), unicode('I');
I;73
sqlite> SELECT hex('Ile-de-France');
496C652D64652D4672616E6365
sqlite> SELECT hex(region) FROM MyTable WHERE LIBREG like 
'%le-de-France' LIMIT 1;

Error: no such column: region
sqlite> SELECT hex(libreg) FROM MyTable WHERE LIBREG like 
'%le-de-France' LIMIT 1;

CE6C652D64652D4672616E6365

PS: I might be breaking the thread in the mailing list. For some reason, 
the SQLite mailing lists refuses my post from Nabble although I used the 
same email address to register 1) with Nabble and 2) with the SQLite 
mailing list http://sqlite.1065341.n5.nabble.com/

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-19 Thread Hick Gunter
What do the following statements return, when run in sqlite3.exe (Please note 
that single quotes are SQLite3 string delimiters):

SELECT hex('Île-de-France');

SELECT hex(region) FROM MyTable WHERE LIBREG like '%le-de-France' LIMIT 1;

I expect one of them is ISO (lead character > 7F) and the other UTF8 (2 
character sequence), so they can never match.
Alternatively, I have also seen "double conversion" ISO -> UTF8 when the 
encoding was already UTF8 but the conversion ISO -> UTF8 was performed anyway.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von CC
Gesendet: Sonntag, 18. Juni 2017 12:53
An: sqlite-users@mailinglists.sqlite.org
Betreff: [sqlite] How to search for fields with accents in UTF-8 data?

Hello

I imported a CSV file where data are encoded in UTF-8.

Some of the characters (like Î) are not available in the ASCII table, so I 
can't use the CLI sqlite3.exe to search.

As an alternative, I tried SQLite Studio, but it fails:

;Returns no record
SELECT COUNT(*) FROM MyTable WHERE REGION="Île-de-France";

;Returns the expected records
SELECT COUNT(*) FROM MyTable WHERE "LIBREG" LIKE "%le-de-France";

I found nothing in SQLite Studio's menus that could be related to encoding so 
that I could tell it the DB contains UTF-8 instead of ANSI.

Is there another Windows application I could try that is more likely to work 
with UTF-8 data?

Thank you.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-19 Thread Klaas Van B.
For some applications it is, for others not in all cases. For "just" accented 
characters it should be no problem following these instructions.
General advice: download OpenOffice or similar OpenSource packages. They are 
completely free and support almost all OS's.

Kind regards | Vriendelijke groeten | Cordiali saluti,
Klaas `Z4us` van Buiten V, Experienced Freelance ICT-Guy
https://www.linkedin.com/in/klaas-van-buiten-0325b2102


On Mon, 19/6/17, Klaas Van B.  wrote:

 Subject: Re: How to search for fields with accents in UTF-8 data?
 To: "SQLite Maillist" 
 Cc: "CC" 
 Date: Monday, 19 June, 2017, 8:41
 
 You can even make UTF-8 the default encoding
 in Windows as it is in SQLite
 
 
https://superuser.com/questions/239810/setting-utf8-as-default-character-encoding-in-windows-7
 
 CC 
 wrote on Sun, 18 Jun 2017 12:52:33 +0200:
 
 >As an alternative, I tried SQLite
 Studio, but it fails:
 
 ;>Returns no record
 >SELECT COUNT(*) FROM MyTable WHERE
 REGION="Île-de-France";
 
 ;>Returns the expected records
 >SELECT COUNT(*) FROM MyTable WHERE
 "LIBREG" LIKE "%le-de-France";
 
 >I found nothing in SQLite Studio's
 menus that could be related to
 >encoding so that I could tell it
 the DB contains UTF-8 instead of ANSI.
 
 >Is there another Windows
 application I could try that is more likely to
 >work with UTF-8 data?
 
 Kind regards | Vriendelijke groeten |
 Cordiali saluti,
 Klaas `Z4us` van Buiten V, Experienced
 Freelance ICT-Guy
 https://www.linkedin.com/in/klaas-van-buiten-0325b2102
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-19 Thread Klaas Van B.
You can even make UTF-8 the default encoding in Windows as it is in SQLite

https://superuser.com/questions/239810/setting-utf8-as-default-character-encoding-in-windows-7

CC  wrote on Sun, 18 Jun 2017 12:52:33 +0200:

>As an alternative, I tried SQLite Studio, but it fails:

;>Returns no record
>SELECT COUNT(*) FROM MyTable WHERE REGION="Île-de-France";

;>Returns the expected records
>SELECT COUNT(*) FROM MyTable WHERE "LIBREG" LIKE "%le-de-France";

>I found nothing in SQLite Studio's menus that could be related to
>encoding so that I could tell it the DB contains UTF-8 instead of ANSI.

>Is there another Windows application I could try that is more likely to
>work with UTF-8 data?

Kind regards | Vriendelijke groeten | Cordiali saluti,
Klaas `Z4us` van Buiten V, Experienced Freelance ICT-Guy
https://www.linkedin.com/in/klaas-van-buiten-0325b2102
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-18 Thread hfiandor
Dear CC:
I use Spanish language, with letters with accents.

When I try to import from a .csv the data obtained in the table was "rare" and 
I have to implement some routines to convert  "What the program read to the 
letter I want to write", and I solve the problem. I think in a prehistory way.

If you want, I can send to you this routines, and maybe you can suggest me a 
better way.

Yours,
Hfiandor.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to search for fields with accents in UTF-8 data?

2017-06-18 Thread Clemens Ladisch
CC wrote:
> I imported a CSV file where data are encoded in UTF-8.
>
> Some of the characters (like Î) are not available in the ASCII table, so I 
> can't use the CLI sqlite3.exe to search.

The latest version of sqlite3.exe might work.

Anyway, to check that whatever tool you're using uses Unicode correctly, 
execute this:

  SELECT char(206), unicode('Î');

This should output Î and 206.

> ;Returns the expected records
> SELECT COUNT(*) FROM MyTable WHERE "LIBREG" LIKE "%le-de-France";

Does "SELECT unicode(Libreg) FROM MyTable WHERE Libreg LIKE '%le-de-France' 
LIMIT 1"
return the correct code?

> I found nothing in SQLite Studio's menus that could be related to encoding so 
> that I could tell it the DB contains UTF-8 instead of ANSI.

The database API always uses Unicode.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How to search for fields with accents in UTF-8 data?

2017-06-18 Thread CC

Hello

I imported a CSV file where data are encoded in UTF-8.

Some of the characters (like Î) are not available in the ASCII table, so 
I can't use the CLI sqlite3.exe to search.


As an alternative, I tried SQLite Studio, but it fails:

;Returns no record
SELECT COUNT(*) FROM MyTable WHERE REGION="Île-de-France";

;Returns the expected records
SELECT COUNT(*) FROM MyTable WHERE "LIBREG" LIKE "%le-de-France";

I found nothing in SQLite Studio's menus that could be related to 
encoding so that I could tell it the DB contains UTF-8 instead of ANSI.


Is there another Windows application I could try that is more likely to 
work with UTF-8 data?


Thank you.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users