Constantine Yannakopoulos wrote:
> I have a case where the user needs to perform a search in a text column of
> a table with many rows. Typically the user enters the first n matching
> characters as a search string and the application issues a SELECT statement
> that uses the LIKE operator with the search string:
> 
> SELECT * FROM ATable WHERE AColumn LIKE :SearchString || '%'
> 
> According to the LIKE optimization this statement will use an index so it
> will be fast.

FWIW, sqlite only use index if pattern is string constant or placeholder, it
won't try to use index optimization with more complex constant expression.
I.e.
   SELECT * FROM ATable WHERE AColumn LIKE :SearchString
and append % to your search string outside of sql.
(Well, it does not matter, as I think your following question does not have
acceptable solution).

> The application is used by Greek users. The greek alphabet has some letters
> that are visually identical to corresponding latin letters when in
> capitals, (e.g. A, E, I, O, B, H, K, etc), some that are different but
> match deterministically to a single latin letter (e.g. Σ = S, Φ = F, Ω = O
> etc.) and some that don't have a deterministic match (e.g. Γ = Gh or Yi).

You may want to look at ICU extension, but then sqlite won't use index.

Another option may be fts extension (likely, with custom tokenizer function).

> The table contains strings that consist of words that can be written in
> either latin or greek characters; sometimes even mixed (the user changed
> input locale midword before typing the first non-common letter). I have a
> request that the search should match strings that are written with either
> latin or greek or mixed letters, e.g. "MIS" should match "MISKO" (all
> latin), "ΜΙΣΚΟ" (all greek) or "MIΣΚΟ" (first two letters latin, rest
> greek). I thought of using a custom collation that does this type of
> comparison, have the column use that collation and create an index on that
> column to speed up the search but I discovered that the LIKE operator
> either will not use collations other than BINARY and NOCASE (pragma
> case_sensitive_like) or (if overloaded to perform custom matching) will not

I think "will not use index" is a bug that was fixed in latest sqlite version;
however, "ignoring collation" is intended behavior, and not easy to change :-(.

> use an index, and, worse yet, its behaviour will be the same to all string
> comparisons regardless of collation. So, a full table scan seems inevitable.
> I was wondering whether it is realistic to ask for the LIKE operator to use
> by default the assigned collation of a column. I assume that an index on

From first look, it won't be easy. Probably impossible without changing current
sqlite interface for user-defined collation/function/etc. Besides, it will break
compatibility with existing code, so would require some new PRAGMA to enable.

> that column is using by default the specified collation of the column for
> comparisons, so a LIKE clause like the aforementioned can use the index and
> perform a fast search while using the "mixed" comparison I need. This would
> transparently solve my problem and make the case_sensitive_like pragma
> redundant, but for backward compatibility this behaviour could be activated
> by a new pragma.
> 
> Are there any details I am missing that prevent this from being implemented?
> 
> Thanks in advance.
> 
> --Constantine.


RSmith wrote:
[...]
> It will be a matter of finding the most acceptable deficit... Whether it
> be size, time waste, upgrade cost etc.  By the way, I don't think
> upgrading the table schemata need to be a real hard thing... some
> scripts can take care of that in minimum amount of time. (Test them
> thoroughly though). Also, another poster here had developed a full set
> of international collations and comparison mechanisms as a loadable
> extension to SQLite - Nunicode by Aleksey Tulinov I think... link here:
>
> https://bitbucket.org/alekseyt/nunicode

Unfortunately, builtin LIKE ignores collation, and if you override LIKE (as in
ICU extension), sqlite won't use index for optimization.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to