[jira] [Updated] (LUCENE-6339) [suggest] Near real time Document Suggester

Areek Zillur (JIRA) Wed, 04 Mar 2015 15:06:46 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Areek Zillur updated LUCENE-6339:
---------------------------------
    Description: 
The idea is to index documents with one or more *SuggestField*(s) and be able 
to suggest documents with a *SuggestField* value that matches a given key.
A SuggestField can be assigned a numeric weight to be used to score the 
suggestion at query time.

Document suggestion can be done on an indexed *SuggestField*. The document 
suggester can filter out deleted documents in near real-time. The suggester can 
filter out documents based on a Filter (note: may change to a non-scoring 
query?) at query time.

A custom postings format (CompletionPostingsFormat) is used to index 
SuggestField(s) and perform document suggestions.

h4. Usage
{code:java}
  // hook up custom postings format
  // indexAnalyzer for SuggestField
  Analyzer analyzer = ...
  IndexWriterConfig config = new IndexWriterConfig(analyzer);
  Codec codec = new Lucene50Codec() {
    @Override
    public PostingsFormat getPostingsFormatForField(String field) {
      if (isSuggestField(field)) {
        return new 
CompletionPostingsFormat(super.getPostingsFormatForField(field));
      }
      return super.getPostingsFormatForField(field);
    }
  };
  config.setCodec(codec);
  IndexWriter writer = new IndexWriter(dir, config);
  // index some documents with suggestions
  Document doc = new Document();
  doc.add(new SuggestField("suggest_title", "title1", 2));
  doc.add(new SuggestField("suggest_name", "name1", 3));
  writer.addDocument(document)
  ...
  // open an nrt reader for the directory
  DirectoryReader reader = DirectoryReader.open(writer, false);
  // SuggestIndexSearcher is a thin wrapper over IndexSearcher
  // queryAnalyzer will be used to analyze the query string
  SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, 
queryAnalyzer);
  
  // suggest 10 documents for "titl" on "suggest_title" field
  TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
{code}

h4. Indexing
Index analyzer set through *IndexWriterConfig*
{code:java}
new SuggestField(name, suggestion, weight)
{code}

h4. Query
Query analyzer set through *SuggestIndexSearcher*.
Hits are collected in descending order of the suggestion's weight 
{code:java}
// full options for TopSuggestDocs (TopDocs)
TopSuggestDocs suggest = suggestIndexSearcher.suggest(String field, 
CharSequence key, int num, Filter filter)

// full options for Collector
// note: only collects does not score
suggestIndexSearcher.suggest(String field, CharSequence key, int maxNumPerLeaf, 
Filter filter, Collector collector)
{code}

h4. Analyzer
*CompletionAnalyzer* can be used instead to wrap another analyzer to tune 
suggest field only parameters. 
{code:java}
CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer);
completionAnalyzer.setPreserveSep(..)
completionAnalyzer.setPreservePositionsIncrements(..)
completionAnalyzer.setMaxGraphExpansions(..)
{code}

  was:
The idea is to index documents with one or more *SuggestField*(s) and be able 
to suggest documents with a *SuggestField* value that matches a given key.
Individual *SuggestField* can be assigned a numeric weight to be used to score 
the suggestion at query time.

Document suggestion can be done on an indexed *SuggestField*. The document 
suggester can filter out deleted documents in near real-time. The suggester can 
filter out documents based on a Filter (note: may change to a non-scoring 
query?) at query time.

A custom postings format (CompletionPostingsFormat) is used to index 
*SuggestField*s and perform document suggestions.

h4. Usage
{code:java}
  // hook up custom postings format
  // indexAnalyzer for SuggestField
  Analyzer analyzer = ...
  IndexWriterConfig config = new IndexWriterConfig(analyzer);
  Codec codec = new Lucene50Codec() {
    @Override
    public PostingsFormat getPostingsFormatForField(String field) {
      if (isSuggestField(field)) {
        return new 
CompletionPostingsFormat(super.getPostingsFormatForField(field));
      }
      return super.getPostingsFormatForField(field);
    }
  };
  config.setCodec(codec);
  IndexWriter writer = new IndexWriter(dir, config);
  // index some documents with suggestions
  Document doc = new Document();
  doc.add(new SuggestField("suggest_title", "title1", 2));
  doc.add(new SuggestField("suggest_name", "name1", 3));
  writer.addDocument(document)
  ...
  // open an nrt reader for the directory
  DirectoryReader reader = DirectoryReader.open(writer, false);
  // SuggestIndexSearcher is a thin wrapper over IndexSearcher
  // queryAnalyzer will be used to analyze the query string
  SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, 
queryAnalyzer);
  
  // suggest 10 documents for "titl" on "suggest_title" field
  TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
{code}

h4. Indexing
Index analyzer set through *IndexWriterConfig*
{code:java}
new SuggestField(name, suggestion, weight)
{code}

h4. Query
Query analyzer set through *SuggestIndexSearcher*.
Hits are collected in descending order of the suggestion's weight 
{code:java}
// full options for TopSuggestDocs (TopDocs)
TopSuggestDocs suggest = suggestIndexSearcher.suggest(String field, 
CharSequence key, int num, Filter filter)

// full options for Collector
// note: only collects does not score
suggestIndexSearcher.suggest(String field, CharSequence key, int maxNumPerLeaf, 
Filter filter, Collector collector)
{code}

h4. Analyzer
*CompletionAnalyzer* can be used instead to wrap another analyzer to tune 
suggest field only parameters. 
{code:java}
CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer);
completionAnalyzer.setPreserveSep(..)
completionAnalyzer.setPreservePositionsIncrements(..)
completionAnalyzer.setMaxGraphExpansions(..)
{code}


> [suggest] Near real time Document Suggester
> -------------------------------------------
>
>                 Key: LUCENE-6339
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6339
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 5.0
>            Reporter: Areek Zillur
>            Assignee: Areek Zillur
>             Fix For: 5.0
>
>         Attachments: LUCENE-6339.patch
>
>
> The idea is to index documents with one or more *SuggestField*(s) and be able 
> to suggest documents with a *SuggestField* value that matches a given key.
> A SuggestField can be assigned a numeric weight to be used to score the 
> suggestion at query time.
> Document suggestion can be done on an indexed *SuggestField*. The document 
> suggester can filter out deleted documents in near real-time. The suggester 
> can filter out documents based on a Filter (note: may change to a non-scoring 
> query?) at query time.
> A custom postings format (CompletionPostingsFormat) is used to index 
> SuggestField(s) and perform document suggestions.
> h4. Usage
> {code:java}
>   // hook up custom postings format
>   // indexAnalyzer for SuggestField
>   Analyzer analyzer = ...
>   IndexWriterConfig config = new IndexWriterConfig(analyzer);
>   Codec codec = new Lucene50Codec() {
>     @Override
>     public PostingsFormat getPostingsFormatForField(String field) {
>       if (isSuggestField(field)) {
>         return new 
> CompletionPostingsFormat(super.getPostingsFormatForField(field));
>       }
>       return super.getPostingsFormatForField(field);
>     }
>   };
>   config.setCodec(codec);
>   IndexWriter writer = new IndexWriter(dir, config);
>   // index some documents with suggestions
>   Document doc = new Document();
>   doc.add(new SuggestField("suggest_title", "title1", 2));
>   doc.add(new SuggestField("suggest_name", "name1", 3));
>   writer.addDocument(document)
>   ...
>   // open an nrt reader for the directory
>   DirectoryReader reader = DirectoryReader.open(writer, false);
>   // SuggestIndexSearcher is a thin wrapper over IndexSearcher
>   // queryAnalyzer will be used to analyze the query string
>   SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, 
> queryAnalyzer);
>   
>   // suggest 10 documents for "titl" on "suggest_title" field
>   TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10);
> {code}
> h4. Indexing
> Index analyzer set through *IndexWriterConfig*
> {code:java}
> new SuggestField(name, suggestion, weight)
> {code}
> h4. Query
> Query analyzer set through *SuggestIndexSearcher*.
> Hits are collected in descending order of the suggestion's weight 
> {code:java}
> // full options for TopSuggestDocs (TopDocs)
> TopSuggestDocs suggest = suggestIndexSearcher.suggest(String field, 
> CharSequence key, int num, Filter filter)
> // full options for Collector
> // note: only collects does not score
> suggestIndexSearcher.suggest(String field, CharSequence key, int 
> maxNumPerLeaf, Filter filter, Collector collector)
> {code}
> h4. Analyzer
> *CompletionAnalyzer* can be used instead to wrap another analyzer to tune 
> suggest field only parameters. 
> {code:java}
> CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer);
> completionAnalyzer.setPreserveSep(..)
> completionAnalyzer.setPreservePositionsIncrements(..)
> completionAnalyzer.setMaxGraphExpansions(..)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-6339) [suggest] Near real time Document Suggester

Reply via email to