2007/12/4 Ingo Godau-Gellert <[EMAIL PROTECTED]>:
> What is really strange is that FTS3 search phrases like
> SELECT referenzcode FROM volltext where volltext match ('installation
> manual') are performed really fast within some milliseconds, independent
> to the search phrase.
> But in general I allow the user to enter a search word in a dedicated
> form field (Windows computers), the search starts after entering of each
> additional character.
>
> That means:
> Entering the word "installation" the search starts after entering the
> character "i". Then, after entering the second character "n" the search
> field is "in" and starts again. Then the user enters "s" and the search
> is interrupted and starts again with search word "ins".
>
> To be able to find not only table entries containing "i", "in", "ins",
> "inst", ... there is automatically added the character "*" - in fact the
> search phrase is "i*", "in*", "ins"*, "inst*", ...
>
> Now it's very interesting that a search phrase containing at least 4
> characters causes a search time of max. some seconds.

In general, the more specific the term is, the faster the search will
be.  Very unlikely terms will be faster than common terms.

If you do a search for 'installation', then fts can dig directly down
to the information for 'installation' and serve it up.  But if you
search for 'i*', it's as if you're searching for all terms starting
with 'i', so fts has to find all those terms and merge their
information together.  This is always going to be slower than a query
for a specific term within that range of terms.

The specific reasons why you're seeing slow performance, and the
specific things you see slow performance on, are very dependent on the
data in your system.  In general, more is slower than less.  So in
cases where you have a large number of results, it will generally be
slower.

> The search of "E5*" f.e takes 2,7 seconds and is finding 24419 entries.
> But if the search f.e. is "F1*" the search takes around 1128,5 seconds
> to find 77652 entries.

This seems a little excessive, though.  I do see that there's an
O(N^2) path in the prefix-searching (loadSegmentLeavesInt()'s call to
docListUnion()).  I can reasonably make that O(logN), which might help
a great deal, if you're hitting it.  Not really sure how to tell if
you're hitting it, but I'll experiment at my end and see whether I can
improve things there.

> It's clear to me that the search time in some cases takes longer as in
> other cases. Especially I would expect that the search takes as longer
> as the amount of found entries is bigger.
>
> But is there anyone who could explain me, why "F1*" takes 1128,5 seconds
> search time? Or "F3*" takes 202,8s? What is the reason for such a long
> duration?

If there are a lot of hits distributed across a few terms, that might
be much faster than fewer hits distributed across a large number of
terms.

> In my opinion a short search phrase with "*" should be very fast.

:-).  Prefix searches will likely always be slower than specific
searches, because they're essentially implemented as a set of specific
searches OR'ed together.  Better than that, of course, but still
strictly slower than a single specific search.

> Is there any possibility to solve this behaviour? Today I tried every
> possible search phrase combination with 2 characters only, noticed the
> search time and decided to use FTS search only in case the search will
> likely take less than 30 seconds. As soon as the search phrase will take
> longer than 30 seconds I use the standard SQLITE3 search algorithm.
> That's a workaround for today, but I consider someone being here who
> could improve the algorithm behaviour of FTS3?

SQLite has a function sqlite3_interrupt() which you can use to
interrupt statements.  Since fts3 is implemented in terms of SQLite,
you _should_ be able to call sqlite3_interrupt() to cause it to stop
executing.  If your code is asynchronous, then you could start the
query in the background, and call sqlite3_interrupt() after awhile to
stop execution.

There are a couple different things I've considered for fts to
implement to help out with this case.  One thing would be a "limit"
feature, so you could say something like:

   SELECT rowid FROM t WHERE t MATCH 'a* limit:10';

Why this is helpful is that it tells fts that it only needs to
generate 10 hits, which it might be able to optimize to be very fast.

Another thing I've considered is to add some sort of prefix-specific
index to the system.  I'm not even clear what this would mean, so I'm
not going to describe it :-).  But it would probably be something in
the CREATE statement which indicated that you expected to do prefix
queries and wanted fts to keep additional information to make those
fast.  OR, it might be a completely distinct table which only keeps an
index to make prefix-searching fast.

One reason you might find full-table-scans to sometimes be faster than
fts is if fts ends up essentially looking at most documents.  In that
case, the fts stuff isn't going to be particularily efficient.  There
may be ways to take advantage of this entirely within fts, so that you
can use a single syntax and have it automatically figure out how to
optimize the query.

-scott

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to