[
https://issues.apache.org/jira/browse/LUCENE-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669906#action_12669906
]
Mark Harwood commented on LUCENE-1532:
--------------------------------------
Surely the biggest factor in picking the right spelling suggestion is to look
at the other words the user has typed in the query? A quick search tells me
there are 4 words used in the average Google query (I used 4 words to find this
out). Measuring coocurrence of variants with the other query words seems like a
much more useful measure than considering IDF of isolated word variants?
This may be useful :
http://issues.apache.org/jira/browse/LUCENE-474?focusedCommentId=12358701#action_12358701
> File based spellcheck with doc frequencies supplied
> ---------------------------------------------------
>
> Key: LUCENE-1532
> URL: https://issues.apache.org/jira/browse/LUCENE-1532
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/spellchecker
> Reporter: David Bowen
> Priority: Minor
>
> The file-based spellchecker treats all words in the dictionary as equally
> valid, so it can suggest a very obscure word rather than a more common word
> which is equally close to the misspelled word that was entered. It would be
> very useful to have the option of supplying an integer with each word which
> indicates its commonness. I.e. the integer could be the document frequency
> in some index or set of indexes.
> I've implemented a modification to the spellcheck API to support this by
> defining a DocFrequencyInfo interface for obtaining the doc frequency of a
> word, and a class which implements the interface by looking up the frequency
> in an index. So Lucene users can provide alternative implementations of
> DocFrequencyInfo. I could submit this as a patch if there is interest.
> Alternatively, it might be better to just extend the spellcheck API to have a
> way to supply the frequencies when you create a PlainTextDictionary, but that
> would mean storing the frequencies somewhere when building the spellcheck
> index, and I'm not sure how best to do that.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]