Make var gap terms index a partial prefix trie
----------------------------------------------

                 Key: LUCENE-2948
                 URL: https://issues.apache.org/jira/browse/LUCENE-2948
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 4.0
         Attachments: LUCENE-2948.patch

Var gap stores (in an FST) the indexed terms (every 32nd term, by
default), minus their non-distinguishing suffixes.

However, often times the resulting FST is "close" to a prefix trie in
some portion of the terms space.

By allowing some nodes of the FST to store all outgoing edges,
including ones that do not lead to an indexed term, and by recording
that this node is then "authoritative" as to what terms exist in the
terms dict from that prefix, we can get some important benefits:

  * It becomes possible to know that a certain term prefix cannot
    exist in the terms index, which means we can save a disk seek in
    some cases (like PK lookup, docFreq, etc.)

  * We can query for the next possible prefix in the index, allowing
    some MTQs (eg FuzzyQuery) to save disk seeks.

Basically, the terms index is able to answer questions that previously
required seeking/scanning in the terms dict file.


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to