Areek Zillur created LUCENE-6339:
------------------------------------

             Summary: [suggest] Near real time Document Suggester
                 Key: LUCENE-6339
                 URL: https://issues.apache.org/jira/browse/LUCENE-6339
             Project: Lucene - Core
          Issue Type: New Feature
          Components: core/search
    Affects Versions: 5.0
            Reporter: Areek Zillur
            Assignee: Areek Zillur
             Fix For: 5.0


The idea is to index documents with one or more *SuggestField*(s) and be able 
to suggest documents with a *SuggestField* value that matches a given key.
Individual *SuggestField* can be assigned a numeric weight to be used to score 
the suggestion at query time.

Document suggestion can be done on an indexed *SuggestField*. The document 
suggester can filter out deleted documents in near real-time. The suggester can 
filter out documents based on a Filter (note: may change to a non-scoring 
query?) at query time.

A custom postings format (CompletionPostingsFormat) is used to index 
*SuggestField*s and perform document suggestions.

h4. Usage
{code:java}
    // hook up custom postings format
    // indexAnalyzer for SuggestField
    Analyzer analyzer = ...
    IndexWriterConfig config = new IndexWriterConfig(analyzer);
    Codec codec = new Lucene50Codec() {
      @Override
      public PostingsFormat getPostingsFormatForField(String field) {
        if (isSuggestField(field)) {
          return new 
CompletionPostingsFormat(super.getPostingsFormatForField(field));
        }
        return super.getPostingsFormatForField(field);
      }
    };
    config.setCodec(codec);

  IndexWriter writer = new IndexWriter(dir, config);
  // index some documents with suggestions
  Document doc = new Document();
  doc.add(new SuggestField("suggest_title", "title1", 2));
  doc.add(new SuggestField("suggest_name", "name1", 3));
  writer.addDocument(document)
  ...
  // open an nrt reader for the directory
  DirectoryReader reader = DirectoryReader.open(writer, false);
  // SuggestIndexSearcher is a thin wrapper over IndexSearcher
  // queryAnalyzer will be used to analyze the query string
  SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, 
queryAnalyzer);
  
  // suggest 10 documents for "titl" on "suggest_title" field
  TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
{code}

h4. Indexing
Index analyzer set through *IndexWriterConfig*
{code:java}
new SuggestField(name, suggestion, weight)
{code}

h4. Query
Query analyzer set through *SuggestIndexSearcher*
{code:java}
// full options for TopSuggestDocs (TopDocs)
TopSuggestDocs suggest = suggestIndexSearcher.suggest(String field, 
CharSequence key, int num, Filter filter)

// full options for Collector
// note: only collects does not score
suggestIndexSearcher.suggest(String field, CharSequence key, int maxNumPerLeaf, 
Filter filter, Collector collector)
{code}

h4. Analyzer
*CompletionAnalyzer* can be used instead to wrap another analyzer to tune 
suggest field only parameters. 
{code:java}
CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer);
completionAnalyzer.setPreserveSep(..)
completionAnalyzer.setPreservePositionsIncrements(..)
completionAnalyzer.setMaxGraphExpansions(..)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to