Re: How does wildcard search works under the hood?

Paul Masurel Fri, 08 Sep 2017 17:13:41 -0700

As you said the terms are mapped to a list of documents containing them.
To do so, they are stored in a finite state transducer. For simplification,
you can imagine it as a trie.

When you search for a term, Lucene does a lookup into the dictionary, gets
a position
at which it can read documents in another file.

When you search for a document containing a ?, Lucene can use the fst to
identify all of the terms
matching the pattern. Like for a trie, the fst is especially potent at a
pattern like someprefix* and not as great for patterns like *somesuffix as
it will have to go through the full trie.

Implementation-wise this is done by transforming the pattern into a
deterministic finite state automaton,
and finding the terms that match the pattern in the dictionary.

https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/lucene/core/src/java/org/apache/lucene/search/AutomatonQuery.java

On Sat, Sep 9, 2017 at 8:21 AM, alexpusch <a...@getjaco.com> wrote:

> I understand the concept of an inverted index. Terms are mapped the the
> docs
> containing them. But when the query contains a wildcard (*, ?) we cannot
> simply use an index entry.
>
> How does wildcard searches work under the hood? Is there some other index
> helping Lucene to find all the relevant terms, and then the inverted index
> is used with all of them?
>
> I'm hoping to get a better understanding of this issue to be able to reason
> poor performance of the application I work on
>
> Thanks, Alex.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Lucene-General-f642108.html
>

Re: How does wildcard search works under the hood?

Reply via email to