Re: Scores between words. Boosting?

liat oren Mon, 09 Mar 2009 13:15:07 -0700

I have an index that has for every two words a score.
I would like my analyzer - that is a combination of whitespace tokenizer, a
stop words analyzer and stemming.


The regular score of Lucene takes into account the position of the words.

I would like to add another factor to that score which is these score
between words.
Instead of having score 0 to words that are not equal, I would like to use
this index in the calculation.

Is it better explained?

Thanks a lot,
Liat

2009/3/9 Grant Ingersoll <gsing...@apache.org>

> Hmmm, I have some inklings of an idea, but can we take a step back?  Can
> you explain the problem you are trying to solve at a higher level (instead
> of the current solution)?  I imagine it is something related to
> co-occurrence analysis.
>
>
>
> On Mar 8, 2009, at 8:05 AM, liat oren wrote:
>
> Hi Grant,
>>
>> No, you can only have two words - the score is between two words.
>>
>> "cat dog" and "dog cat" is equivalent, it will actually always be "cat
>> dog"
>> - going by alphabetic order.
>>
>> About the boosting, I read a bit about it - but couldn't find how it can
>> help me, unless I change every appearance of the word dog to have also cat
>> and animal using the weight of the score.
>> So, for example, every word will appear 10 times from what it is - if
>> apple
>> appears 1, I will do the boosting so it appears 10 times.
>> If dog appears, then it will also have cat twice (0.2*10) and animal 5
>> times(0.5*10).
>>
>> But I hope to have another better solution.
>>
>>
>> Thanks
>> 2009/3/8 Grant Ingersoll <gsing...@apache.org>
>>
>> Hi Liat,
>>>
>>> Some questions inline below.
>>>
>>> On Mar 8, 2009, at 5:49 AM, liat oren wrote:
>>>
>>> Hi,
>>>
>>>>
>>>> I have scores between words, for example - dog and animal have a score
>>>> of
>>>> 0.5 (and not 0), dog and cat have a score of 0.2, etc.
>>>> These scores are stored in an index:
>>>> Doc1: field words: dog animal
>>>>      field score: 0.5
>>>> Doc2: field words: dog cat
>>>>      field score: 0.2
>>>>
>>>> If the user searches for the word dog - I would like that documents that
>>>> contain the word animal or cat will also get a good score (that will
>>>> take
>>>> into account the 0.5 and 0.2).
>>>>
>>>>
>>> Is it always the case that these come in pairs?  In other words, would
>>> you
>>> ever have:
>>> field words: dog cat animal
>>> score: 0.9
>>>
>>> Also, is the following equivalent, or would it have a different score:
>>> field words: cat dog
>>> score: 0.2
>>>
>>>
>>>
>>>
>>>> Basically what I do is: for every document in the database, I loop over
>>>> the
>>>> words that appear in the query (the query is long in a size of an
>>>> article)
>>>> and for every word that appears in each document I take the score from
>>>> the
>>>> index mentioned above and calculating a score between the query and each
>>>> document.
>>>>
>>>> Any suggestion how to do it using Lucene search? How to add these values
>>>> to
>>>> the searcher?
>>>>
>>>>
>>> Thinking...
>>>
>>>
>>>
>>>> I looked at the boosting option, but couldn't really see how it helps me
>>>> to
>>>> that matter.
>>>>
>>>>
>>> What "boosting option" did you look at?  Can you explain a bit more?
>>>
>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Scores between words. Boosting?

Reply via email to