Doug Cutting wrote:

Aad Nales wrote:

Before I start reinventing wheels I would like to do a short check to
see if anybody else has already tried this. A customer has requested us
to look into the possibility to perform a spell check on queries. So far
the most promising way of doing this seems to be to create an Analyzer
based on the spellchecker of OpenOffice. My question is: "has anybody
tried this before?"


Note that a spell checker used with a search engine should use collection frequency information. That's to say, only "corrections" which are more frequent in the collection than what the user entered should be displayed. Frequency information can also be used when constructing the checker. For example, one need never consider proposing terms that occur in very few documents. And one should not try correction at all for terms which occur in a large proportion of the collection.

Good heuristics but are there any more precise, standard guidelines as to how to balance or combine what I think are the following possible criteria in suggesting a better choice:


- ignore(penalize?) terms that are rare
- ignore(penalize?) terms that are common
- terms that are closer (string distance) to the term entered are better
- terms that start w/ the same 'n' chars as the users term are better






Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to