Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
Hi, It was exactly like you said, my bad, so now I have built an icu version. BUT unfortunately it still does not support CJK, why is that ? qiulangs-MacBook-Pro:sqlite-autoconf-3250100 qiulang$ ./sqlite3 SQLite version 3.25.1 2018-09-18 20:20:44 Enter ".help" for usage hints. Connected to a

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Jens Alfke
> On Sep 20, 2018, at 11:01 PM, 邱朗 wrote: > > https://www.sqlite.org/fts5.html said " > The unicode tokenizer classifies all unicode characters as either "separator" > or "token" characters. By default all space and punctuation characters, as > defined by

Re: [sqlite] FTS5 minimum number of characters to index ?

2018-09-21 Thread Jens Alfke
> On Sep 21, 2018, at 3:26 AM, Domingo Alvarez Duarte > wrote: > > looking at some fts5 tables it seems that an option to limit the minimum > number of characters to at least 2 or 3 would be a good shot as stopwords, A real stop-word list is valuable, but I don’t think a simple

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Dan Kennedy
On 09/21/2018 09:44 PM, 邱朗 wrote: I actually first used ./configure CFLAGS="-DSQLITE_ENABLE_ICU `icu-config --cppflags`" LDFLAGS="`icu-config --ldflags`" But I got the error When you ran this configure command, is the first line out output something like the following? bash:

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
I actually first used ./configure CFLAGS="-DSQLITE_ENABLE_ICU `icu-config --cppflags`" LDFLAGS="`icu-config --ldflags`" But I got the error sqlite3.c:184184:10: fatal error: 'unicode/utypes.h' file not found #include Then I added -I -L switches and if I remembered correct I used brew to

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Dan Kennedy
On 09/21/2018 05:21 PM, 邱朗 wrote: Hi, Thanks for replying my question. Following are the error I got when compiling sqlite-autoconf-3250100.tar.gz . The error looks similar to this old discussion http://sqlite.1065341.n5.nabble.com/compiling-Sqlite-with-ICU-td40641.html I am using macOS

[sqlite] FTS5 min_word_size patch small error

2018-09-21 Thread Domingo Alvarez Duarte
Hello ! On my last post about a patch to fts5 to add an option "min_word_size" there is a small mistake on the comparison: Original with mistake: if(p->nMinWordSize && p->nMinWordSize >= wsz) continue; New with mistake fixed (it should be ">" instead of ">="): if(p->nMinWordSize &&

[sqlite] FTS5 min_word_size patch

2018-09-21 Thread Domingo Alvarez Duarte
Hello ! After reporting here previously about this issue I've got a working implementation of "min_word_size" option to Unicode61Tokenizer see patch bellow. With it here is the result of a simple test: ./sqlite3 SQLite version 3.26.0 2018-09-20 20:43:28 Enter ".help" for usage hints.

Re: [sqlite] SQlite 3 - bottleneck with rbuFindMaindb

2018-09-21 Thread Simon Slavin
On 20 Sep 2018, at 10:31pm, Roger Cuypers wrote: > rbuFindMaindb > rbuVfsAccess > sqlite3OsAccess > hasHotJournal > sqlite3PagerSharedLock > zipvfsLockFile Thanks. That's very useful. Your stack includes both zipvfsLockFile and rbuVfsAccess, and I'm not familiar with either of these. So I

[sqlite] FTS5 minimum number of characters to index ?

2018-09-21 Thread Domingo Alvarez Duarte
Hello ! I'm looking in the documentation and it doesn't seem to mention any option to specify a minimum number of characters to index, looking at some fts5 tables it seems that an option to limit the minimum number of characters to at least 2 or 3 would be a good shot as stopwords, another

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
Hi, Thanks for replying my question. Following are the error I got when compiling sqlite-autoconf-3250100.tar.gz . The error looks similar to this old discussion http://sqlite.1065341.n5.nabble.com/compiling-Sqlite-with-ICU-td40641.html I am using macOS 10.13 & xcode 10 Undefined symbols

Re: [sqlite] SQlite 3 - bottleneck with rbuFindMaindb

2018-09-21 Thread Roger Cuypers
Ok, I have more info now. The database consists of multiple individual database files which are opened and closed individually each with their own connection, multiple at at time. There is a root file but its just another database file whose only purpose is to tell the application where to find

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Dan Kennedy
On 09/21/2018 01:38 PM, 邱朗 wrote: I think it could be made to work, or at least, I have experience making it work with CJK based on functionality exposed via ICU. I don't know if the unicode tokenizer uses ICU or if the functionality in ICU that I used is available in the unicode tables. Not

Re: [sqlite] Docs typo JSON1 @ 4.13

2018-09-21 Thread John G
In that same JSON page, in 1. Overview the text mentions '12 of 14 SQL functions' but the listing shows different numbers - 13 numbered items in the first section, 2 in the second, numbered 1 - 15. Should that be "twelve of the *fifteen* SQL functions" or "*thirteen* of the *fifteen* SQL

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Scott Robison
On Fri, Sep 21, 2018 at 12:39 AM 邱朗 wrote: > > >I think it could be made to work, or at least, I have experience > >making it work with CJK based on functionality exposed via ICU. I > >don't know if the unicode tokenizer uses ICU or if the functionality > >in ICU that I used is available in the

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
> >I think it could be made to work, or at least, I have experience >making it work with CJK based on functionality exposed via ICU. I >don't know if the unicode tokenizer uses ICU or if the functionality >in ICU that I used is available in the unicode tables. Not >understanding any of the

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread Scott Robison
On Fri, Sep 21, 2018 at 12:02 AM 邱朗 wrote: > > https://www.sqlite.org/fts5.html said " The unicode tokenizer classifies all > unicode characters as either "separator" or "token" characters. By default > all space and punctuation characters, as defined by Unicode 6.1, are > considered

Re: [sqlite] Why sqlite fts5 Unicode61 Tokenizer does not support CJK(Chinese Japanese Krean)?

2018-09-21 Thread 邱朗
https://www.sqlite.org/fts5.html said " The unicode tokenizer classifies all unicode characters as either "separator" or "token" characters. By default all space and punctuation characters, as defined by Unicode 6.1, are considered separators, and all other characters as token characters... "