On Mar 17, 2009, at 5:44 AM, liat oren wrote:

Thanks for all the answers.

I am new to Lucene and in the emails its the first time I heard of the
bigrams and thus read about them a bit.

Question - if I query for "cat animal" - or use boosting - "cat^2
animal^0.5" - will the results return ONLY documents that contain both? From what I saw until now - it can also show documents that contain one of
them, no?

I think if you are using bigrams, then you would only match on one, but if you do the prefix/wildard approach you could match on either. I'm not sure if you will be able to pull off doing the individual term boosting and the bigrams. You will likely need to write your own Query classes to do that.

If you don't mind me asking, what is the problem you are trying to solve? I know the solution you want (I think, namely boosted bigrams of some sort), but I'm still clueless on the problem and I think that is really hindering me helping. It sounds like it is some type of co- occurrence problem, but I'm not sure. Is there a bigger category that what you are doing fits in? If you can't say, that is fine, too. It may be some proprietary thing.



Can you please elaborate a bit more on your suggestion?

I read a bit on the synonyms and the wordNet package.
Isn't there a way to use an index that is structured in the same way the index of the wordNet (any idea how is this index built?), but stores other
values?



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to