Doug Cutting wrote:

David Spencer wrote:

Doug Cutting wrote:

And one should not try correction at all for terms which occur in a large proportion of the collection.



I keep thinking over this one and I don't understand it. If a user misspells a word and the "did you mean" spelling correction algorithm determines that a frequent term is a good suggestion, why not suggest it? The very fact that it's common could mean that it's more likely that the user wanted this word (well, the heuristic here is that users frequently search for frequent terms, which is probabably wrong, but anyway..).


I think you misunderstood me. What I meant to say was that if the term the user enters is very common then spell correction may be skipped. Very common words which are similar to the term the user entered should of course be shown. But if the user's term is very common one need not even attempt to find similarly-spelled words. Is that any better?

Yes, sure, thx, I understand now - but maybe not - the context I was something like this:


[1] The user enters a query like:
    recursize descent parser

[2] The search code parses this and sees that the 1st word is not a term in the index, but the next 2 are. So it ignores the last 2 terms ("recursive" and "descent") and suggests alternatives to "recursize"...thus if any term is in the index, regardless of frequency, it is left as-is.

I guess you're saying that, if the user enters a term that appears in the index and thus is sort of spelled correctly ( as it exists in some doc), then we use the heuristic that any sufficiently large doc collection will have tons of misspellings, so we assume that rare terms in the query might be misspelled (i.e. not what the user intended) and we suggest alternativies to these words too (in addition to the words in the query that are not in the index at all).



Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to