Hmmm, I have some inklings of an idea, but can we take a step back?
Can you explain the problem you are trying to solve at a higher level
(instead of the current solution)? I imagine it is something related
to co-occurrence analysis.
On Mar 8, 2009, at 8:05 AM, liat oren wrote:
Hi Grant,
No, you can only have two words - the score is between two words.
"cat dog" and "dog cat" is equivalent, it will actually always be
"cat dog"
- going by alphabetic order.
About the boosting, I read a bit about it - but couldn't find how it
can
help me, unless I change every appearance of the word dog to have
also cat
and animal using the weight of the score.
So, for example, every word will appear 10 times from what it is -
if apple
appears 1, I will do the boosting so it appears 10 times.
If dog appears, then it will also have cat twice (0.2*10) and animal 5
times(0.5*10).
But I hope to have another better solution.
Thanks
2009/3/8 Grant Ingersoll <gsing...@apache.org>
Hi Liat,
Some questions inline below.
On Mar 8, 2009, at 5:49 AM, liat oren wrote:
Hi,
I have scores between words, for example - dog and animal have a
score of
0.5 (and not 0), dog and cat have a score of 0.2, etc.
These scores are stored in an index:
Doc1: field words: dog animal
field score: 0.5
Doc2: field words: dog cat
field score: 0.2
If the user searches for the word dog - I would like that
documents that
contain the word animal or cat will also get a good score (that
will take
into account the 0.5 and 0.2).
Is it always the case that these come in pairs? In other words,
would you
ever have:
field words: dog cat animal
score: 0.9
Also, is the following equivalent, or would it have a different
score:
field words: cat dog
score: 0.2
Basically what I do is: for every document in the database, I loop
over
the
words that appear in the query (the query is long in a size of an
article)
and for every word that appears in each document I take the score
from the
index mentioned above and calculating a score between the query
and each
document.
Any suggestion how to do it using Lucene search? How to add these
values
to
the searcher?
Thinking...
I looked at the boosting option, but couldn't really see how it
helps me
to
that matter.
What "boosting option" did you look at? Can you explain a bit more?
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using
Solr/Lucene:
http://www.lucidimagination.com/search
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org