Pablo Gomes Ludermir wrote:
Hello all,
I know that we can expand a word to get its synonyms with Wordnet. I was wondering if we could reduce the index size by including a synonym instead of a word on the synonym list.
For instance, if "screen" shows up, I would like to replace it by "monitor" (it is a stupid example, but it was the first thing that crossed my mind). Thus, instead of having both entries on the index, I would have only one.
Thus, I would need to pre-process any queries, replacing the words by its synonyms as well. I was wondering if someone has done such a thing in an analyzer already and could give me a little help.
Already done, I did it, bottom of this page, search for wordnet: http://lucene.apache.org/java/docs/lucene-sandbox/
It runs in 2 phases:
[1] Parses some of Wordnet, stores synonyms in a Lucene index as a kind of persistent Map. This is just run once.
[2] Query expansion, does things like expanding "monitor" to "monitor screen". This runs against an unchanged index.
My aim is to reduce the index as much as possible (I already have a stemmer and a stopword filter on the analyzer). Could anyone point other ways to reduce the number of terms of an index?
The fact is that I would like to create "extra vectors" with my own weighting scheme, and it is a quite costly algorithm, so the less terms I have the better it performs.
Regards, Pablo
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]