Luke Shannon wrote:
Hello;
Does anyone see a problem with the following approach?
No, no problem with it and it's in fact what my "Wordnet Query Expansion" sandbox module does.
The nice thing about Lucene is you at least have the option of doing things the other way - you can write a custom Analyzer that puts all synonyms at the same token offset so they appear to be in the same place in the token stream. Thinking about it...this approach, with the Analyzer, lets user search for phrases which would match a synonym, so, using your example below, the text "bright red engine" would be matched by either phrase "bright red" or "bright colour". Doing the query expansion is trickier if you allow phrases.
For synonyms, rather than putting them in the index, I put the original term and all the synonyms in the query.
Every time I create a query, I check if the term has any synonyms. If it does, I create Boolean Query OR'ing one Query object for each synonym.
So if I have a synoym list:
red = colour, primary, stop
And someone wants to search the desc field for the red, I would end up with something like:
( (desc:*red*) (desc:*colout*) (desc:*stop*) ).
I don't like that bit about substring terms, but if it's right for you ok - if you insist on loosening things I'd consider fuzzy terms (desc:red~ ...etc).
Now the synonyms would'nt be in the index, the Query would account for all the possible synonym terms.
Luke
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
