On 08/06/2013 02:50 PM, Ankit Murarka wrote: > This does not seem to help.As per suggestion, here's what I did": > a. Indexed the document line by line. Verified from Luke that it is > actually indexing line by line. > b. Effectively each line is a phrase over here. > > I dont seem to understand how do I index this whole phrase as > SpellChecker suggestion. When I passed the index as it is, the > SpellChecker suggestion provided only the word suggestions rather than > phrase suggestion.
If you say that index writer did a good job, than you must have configured spellchecker the wrong way. To avoid guessing each point of configuration, I'm sending you the complete working example. Check these lines: // @indexing SpellChecker phraseRecommender = new SpellChecker(spellDir); IndexReader reader = DirectoryReader.open(dir); phraseRecommender.indexDictionary(new LuceneDictionary(reader, REC_FIELD_NAME), iwc, true); // @query recommendation SpellChecker phraseRecommender = new SpellChecker(spellDir); phraseRecommender.setAccuracy(0.3f); Complete working code: import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.FieldType; import org.apache.lucene.document.StringField; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.search.spell.LuceneDictionary; import org.apache.lucene.search.spell.SpellChecker; import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.util.Version; public class PhraseSuggestion { public static final String REC_FIELD_NAME = "recommendation"; public static void main(String[] args) throws IOException { RAMDirectory phrasesDir = new RAMDirectory(); RAMDirectory spellDir = new RAMDirectory(); // Index time indexPhrases(phrasesDir, spellDir, "What have the Romans ever done for us?", "This parrot is no more.", "A tiger... in Africa?", "That Rabbit's Dynamite!!", "Lovely spam! Wonderful spam!", "Spam spam spam spam...", "A duck", "Strange ladies lying in pools distributing swords is no basis for government", "Nobody expects the Spanish Inquisition"); // Query suggestion time SpellChecker phraseRecommender = new SpellChecker(spellDir); phraseRecommender.setAccuracy(0.3f); System.out.println(getSuggestion("I like spamming with a spam", phraseRecommender)); System.out.println(getSuggestion("I want parrot and a rabbit", phraseRecommender)); System.out.println(getSuggestion("rabbit dynamite", phraseRecommender)); } public static void indexPhrases(Directory dir, Directory spellDir, String ... phrases) throws IOException { IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_43, new StandardAnalyzer(Version.LUCENE_43)); IndexWriter writer = new IndexWriter(dir, iwc); for (int i = 0; i < phrases.length; i++) { addRecommendation(phrases[i], writer); } writer.close(); SpellChecker phraseRecommender = new SpellChecker(spellDir); IndexReader reader = DirectoryReader.open(dir); phraseRecommender.indexDictionary(new LuceneDictionary(reader, REC_FIELD_NAME), iwc, true); phraseRecommender.close(); reader.close(); } private static void addRecommendation(String phrase, IndexWriter writer) throws CorruptIndexException, IOException { Document doc = new Document(); FieldType ft = new FieldType(StringField.TYPE_NOT_STORED); ft.setOmitNorms(false); Field f = new Field(REC_FIELD_NAME, phrase, ft); doc.add(f); writer.addDocument(doc); } public static String getSuggestion(String query, SpellChecker phraseRecommender) throws IOException { String[] suggestions = phraseRecommender.suggestSimilar(query, 5); if (suggestions.length > 0) return suggestions[0]; else return null; } } It prints: Lovely spam! Wonderful spam! This parrot is no more. That Rabbit's Dynamite!! Regards, Ivan Krišto > On 8/2/2013 7:58 PM, Ivan Krišto wrote: >> On 08/02/2013 10:16 AM, Ankit Murarka wrote: >> >>> is it possible to implement Complete Phrase Suggest Feature in Lucene >>> 4.3 . So if I enter an incorrect phrase it can suggest me few possible >>> valid phrases. >>> >>> One way could be to get suggestion for each word in the sentence and >>> calling SpellChecker.suggestSimilar for each word. This can be done >>> but this won't help me build a near possible phrase. >>> >>> If I input "Wanna chk Luc Fetre" then I will get different spell >>> suggestions for each word but this wont help me build a near exact >>> phrase. >>> >> I did something similar some time ago (I've used Lucene 4.0 trunk before >> its release, and I don't know if spellchecker API changed since then). >> >> Idea is simple: >> - Take a list of valid phrases and index whole phrases as spellchecker >> suggestions. >> >> My implementation: >> - As a list of valid phrases I took queries from search engine query >> log. >> - At index time, beside saving phrases, I also saved occurance number of >> single phrases. >> - My phrase suggestion would take 5 most similar phrases to given query >> and returned most common phrase from index. >> It's very simple and works quite well. >> >> A few tips: >> - Think when to show phrase suggestion, e.g. show suggestion only if >> most common suggested phrase occures 10 time more often than given >> query. >> - Explore different distance measures and their parameters. >> - Maybe it would be good to use only word 3-grams as phrases (if you >> have query "how to use lucene", you would index "how to use" and "to use >> lucene" as phrases) -- than you would "fix" given query by parts. >> - To explore more solutions of this problem search papers for "related >> query suggestion". >> - Twitter came to similar idea as I did: >> https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search >> >> >> >> Regards, >> Ivan Krišto >> >> <https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search> >> >> >> >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org