On Jan 24, 2005, at 7:24 AM, Kevin L. Cobb wrote:

Do stemming algorithms take into consideration abbreviations too?

No, they don't. Adding abbreviations, aliases, synonyms, etc is not stemming.


And, the next logical question, if stemming does not take care of
abbreviations, are there any solutions that include abbreviations inside
or outside of Lucene?

Nothing built into Lucene does this, but the infrastructure allows it to be added in the form of a custom analysis step. There are two basic approaches, adding aliases at indexing time, or adding them at query time by expanding the query. I created some example analyzers in Lucene in Action (grab the source code from the site linked below) that demonstrate how this can be done using WordNet (and mock) synonym lookup. You could extrapolate this into looking up abbreviations and adding them into the token stream.


        http://www.lucenebook.com/search?query=synonyms

        Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to