As you said the terms are mapped to a list of documents containing them. To do so, they are stored in a finite state transducer. For simplification, you can imagine it as a trie.
When you search for a term, Lucene does a lookup into the dictionary, gets a position at which it can read documents in another file. When you search for a document containing a ?, Lucene can use the fst to identify all of the terms matching the pattern. Like for a trie, the fst is especially potent at a pattern like someprefix* and not as great for patterns like *somesuffix as it will have to go through the full trie. Implementation-wise this is done by transforming the pattern into a deterministic finite state automaton, and finding the terms that match the pattern in the dictionary. https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/lucene/core/src/java/org/apache/lucene/search/AutomatonQuery.java On Sat, Sep 9, 2017 at 8:21 AM, alexpusch <a...@getjaco.com> wrote: > I understand the concept of an inverted index. Terms are mapped the the > docs > containing them. But when the query contains a wildcard (*, ?) we cannot > simply use an index entry. > > How does wildcard searches work under the hood? Is there some other index > helping Lucene to find all the relevant terms, and then the inverted index > is used with all of them? > > I'm hoping to get a better understanding of this issue to be able to reason > poor performance of the application I work on > > Thanks, Alex. > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Lucene-General-f642108.html >