Re: Question for Wildcard Search:

Volodymyr Bychkoviak Thu, 23 Jun 2005 03:27:26 -0700

Hello
about 3 months ago I posted some idea about wildcard searching.

main idea was to index every character of input as separate term. andthen search using PhraseQuery.for example word "12345" would be indexed as "1" "2" "3" "4" "5". tofind "*23*" you can use PhraseQuery with this two terms ("2" "3"). Butthis approach is limited only to queries with wildcards in the begin or end.

Later I did some research and wrote Extension to PhraseQuery that allowsto set term relative position to range of values (to insert gaps for "*"and "?") this approach is good because it does not rewrite queries andnever run into OutOfMemory or TooManyClauses Exceptions


regards,
Volodymyr Bychkoviak


14.03.2005 13:54

Dave Kor wrote:

Quoting Dave Kor <[EMAIL PROTECTED]>:

Quoting Erik Hatcher <[EMAIL PROTECTED]>:

Anyone tried this technique with Lucene?

Actually, the problem is that the wildcard code has to search over a large
subset of terms because the list of terms is, well, a linear structure.

If, for example, all terms in the index is arranged as a suffix tree, the
sort
of wildcard search that currently is cpu intensive will no longer be cpu
intensive.


Hmm I realized I should add a qualifier to the above statement. Searching for
matching terms would no longer be cpu intensive, especially for wildcards like
*foo* or *foo. The other wildcard search problem of having too many matching
terms to lookup in the index still remains unsolved.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Question for Wildcard Search:

Reply via email to