Hello,

This does not seem to help.As per suggestion, here's what I did":

a. Indexed the document line by line. Verified from Luke that it is actually indexing line by line.
b. Effectively each line is a phrase over here.

I dont seem to understand how do I index this whole phrase as SpellChecker suggestion. When I passed the index as it is, the SpellChecker suggestion provided only the word suggestions rather than phrase suggestion.

There has to be some different way of indexing the whole phrase as spellchecker suggestion. Please note, the phrase was extracted from the document by indexing it line by line. Each phrase is actually a line.

On 8/2/2013 7:58 PM, Ivan Krišto wrote:
On 08/02/2013 10:16 AM, Ankit Murarka wrote:
is it possible to implement Complete Phrase Suggest Feature in Lucene
4.3 . So if I enter an incorrect phrase it can suggest me few possible
valid phrases.

One way could be to get suggestion for each word in the sentence and
calling SpellChecker.suggestSimilar for each word. This can be done
but this won't help me build a near possible phrase.

If I input "Wanna chk Luc Fetre" then I will get different spell
suggestions for each word but this wont help me build a near exact
phrase.
I did something similar some time ago (I've used Lucene 4.0 trunk before
its release, and I don't know if spellchecker API changed since then).

Idea is simple:
- Take a list of valid phrases and index whole phrases as spellchecker
suggestions.

My implementation:
- As a list of valid phrases I took queries from search engine query log.
- At index time, beside saving phrases, I also saved occurance number of
single phrases.
- My phrase suggestion would take 5 most similar phrases to given query
and returned most common phrase from index.
It's very simple and works quite well.

A few tips:
- Think when to show phrase suggestion, e.g. show suggestion only if
most common suggested phrase occures 10 time more often than given query.
- Explore different distance measures and their parameters.
- Maybe it would be good to use only word 3-grams as phrases (if you
have query "how to use lucene", you would index "how to use" and "to use
lucene" as phrases) -- than you would "fix" given query by parts.
- To explore more solutions of this problem search papers for "related
query suggestion".
- Twitter came to similar idea as I did:
https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search


   Regards,
     Ivan Krišto

<https://blog.twitter.com/2012/related-queries-and-spelling-corrections-search>




--
Regards

Ankit Murarka

"What lies behind us and what lies before us are tiny matters compared with what 
lies within us"


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to