[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Areek Zillur updated LUCENE-6339: --------------------------------- Description: The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return new CompletionPostingsFormat(super.getPostingsFormatForField(field)); } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField("suggest_title", "title1", 2)); doc.add(new SuggestField("suggest_name", "name1", 3)); writer.addDocument(document) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for "titl" on "suggest_title" field TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} new SuggestField(name, suggestion, weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest = suggestIndexSearcher.suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score suggestIndexSearcher.suggest(String field, CharSequence key, int maxNumPerLeaf, Filter filter, Collector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer); completionAnalyzer.setPreserveSep(..) completionAnalyzer.setPreservePositionsIncrements(..) completionAnalyzer.setMaxGraphExpansions(..) {code} was: The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. Individual *SuggestField* can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index *SuggestField*s and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return new CompletionPostingsFormat(super.getPostingsFormatForField(field)); } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField("suggest_title", "title1", 2)); doc.add(new SuggestField("suggest_name", "name1", 3)); writer.addDocument(document) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for "titl" on "suggest_title" field TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} new SuggestField(name, suggestion, weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest = suggestIndexSearcher.suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score suggestIndexSearcher.suggest(String field, CharSequence key, int maxNumPerLeaf, Filter filter, Collector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer); completionAnalyzer.setPreserveSep(..) completionAnalyzer.setPreservePositionsIncrements(..) completionAnalyzer.setMaxGraphExpansions(..) {code} > [suggest] Near real time Document Suggester > ------------------------------------------- > > Key: LUCENE-6339 > URL: https://issues.apache.org/jira/browse/LUCENE-6339 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search > Affects Versions: 5.0 > Reporter: Areek Zillur > Assignee: Areek Zillur > Fix For: 5.0 > > Attachments: LUCENE-6339.patch > > > The idea is to index documents with one or more *SuggestField*(s) and be able > to suggest documents with a *SuggestField* value that matches a given key. > A SuggestField can be assigned a numeric weight to be used to score the > suggestion at query time. > Document suggestion can be done on an indexed *SuggestField*. The document > suggester can filter out deleted documents in near real-time. The suggester > can filter out documents based on a Filter (note: may change to a non-scoring > query?) at query time. > A custom postings format (CompletionPostingsFormat) is used to index > SuggestField(s) and perform document suggestions. > h4. Usage > {code:java} > // hook up custom postings format > // indexAnalyzer for SuggestField > Analyzer analyzer = ... > IndexWriterConfig config = new IndexWriterConfig(analyzer); > Codec codec = new Lucene50Codec() { > @Override > public PostingsFormat getPostingsFormatForField(String field) { > if (isSuggestField(field)) { > return new > CompletionPostingsFormat(super.getPostingsFormatForField(field)); > } > return super.getPostingsFormatForField(field); > } > }; > config.setCodec(codec); > IndexWriter writer = new IndexWriter(dir, config); > // index some documents with suggestions > Document doc = new Document(); > doc.add(new SuggestField("suggest_title", "title1", 2)); > doc.add(new SuggestField("suggest_name", "name1", 3)); > writer.addDocument(document) > ... > // open an nrt reader for the directory > DirectoryReader reader = DirectoryReader.open(writer, false); > // SuggestIndexSearcher is a thin wrapper over IndexSearcher > // queryAnalyzer will be used to analyze the query string > SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, > queryAnalyzer); > > // suggest 10 documents for "titl" on "suggest_title" field > TopSuggestDocs suggest = indexSearcher.suggest("suggest_title", "titl", 10); > {code} > h4. Indexing > Index analyzer set through *IndexWriterConfig* > {code:java} > new SuggestField(name, suggestion, weight) > {code} > h4. Query > Query analyzer set through *SuggestIndexSearcher*. > Hits are collected in descending order of the suggestion's weight > {code:java} > // full options for TopSuggestDocs (TopDocs) > TopSuggestDocs suggest = suggestIndexSearcher.suggest(String field, > CharSequence key, int num, Filter filter) > // full options for Collector > // note: only collects does not score > suggestIndexSearcher.suggest(String field, CharSequence key, int > maxNumPerLeaf, Filter filter, Collector collector) > {code} > h4. Analyzer > *CompletionAnalyzer* can be used instead to wrap another analyzer to tune > suggest field only parameters. > {code:java} > CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer); > completionAnalyzer.setPreserveSep(..) > completionAnalyzer.setPreservePositionsIncrements(..) > completionAnalyzer.setMaxGraphExpansions(..) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org