Re: Scores between words. Boosting?

liat oren Mon, 16 Mar 2009 03:56:06 -0700

Hi,
Is there any idea of how to make it work?
Many thanks,
Liat

2009/3/9 liat oren <oren.l...@gmail.com>


>  I have an index that has for every two words a score.
> I would like my analyzer - that is a combination of whitespace tokenizer, a
> stop words analyzer and stemming.
>
> The regular score of Lucene takes into account the position of the words.
>
> I would like to add another factor to that score which is these score
> between words.
> Instead of having score 0 to words that are not equal, I would like to use
> this index in the calculation.
>
> Is it better explained?
>
> Thanks a lot,
> Liat
>
> 2009/3/9 Grant Ingersoll <gsing...@apache.org>
>
> Hmmm, I have some inklings of an idea, but can we take a step back?  Can
>> you explain the problem you are trying to solve at a higher level (instead
>> of the current solution)?  I imagine it is something related to
>> co-occurrence analysis.
>>
>>
>>
>> On Mar 8, 2009, at 8:05 AM, liat oren wrote:
>>
>> Hi Grant,
>>>
>>> No, you can only have two words - the score is between two words.
>>>
>>> "cat dog" and "dog cat" is equivalent, it will actually always be "cat
>>> dog"
>>> - going by alphabetic order.
>>>
>>> About the boosting, I read a bit about it - but couldn't find how it can
>>> help me, unless I change every appearance of the word dog to have also
>>> cat
>>> and animal using the weight of the score.
>>> So, for example, every word will appear 10 times from what it is - if
>>> apple
>>> appears 1, I will do the boosting so it appears 10 times.
>>> If dog appears, then it will also have cat twice (0.2*10) and animal 5
>>> times(0.5*10).
>>>
>>> But I hope to have another better solution.
>>>
>>>
>>> Thanks
>>> 2009/3/8 Grant Ingersoll <gsing...@apache.org>
>>>
>>> Hi Liat,
>>>>
>>>> Some questions inline below.
>>>>
>>>> On Mar 8, 2009, at 5:49 AM, liat oren wrote:
>>>>
>>>> Hi,
>>>>
>>>>>
>>>>> I have scores between words, for example - dog and animal have a score
>>>>> of
>>>>> 0.5 (and not 0), dog and cat have a score of 0.2, etc.
>>>>> These scores are stored in an index:
>>>>> Doc1: field words: dog animal
>>>>>      field score: 0.5
>>>>> Doc2: field words: dog cat
>>>>>      field score: 0.2
>>>>>
>>>>> If the user searches for the word dog - I would like that documents
>>>>> that
>>>>> contain the word animal or cat will also get a good score (that will
>>>>> take
>>>>> into account the 0.5 and 0.2).
>>>>>
>>>>>
>>>> Is it always the case that these come in pairs?  In other words, would
>>>> you
>>>> ever have:
>>>> field words: dog cat animal
>>>> score: 0.9
>>>>
>>>> Also, is the following equivalent, or would it have a different score:
>>>> field words: cat dog
>>>> score: 0.2
>>>>
>>>>
>>>>
>>>>
>>>>> Basically what I do is: for every document in the database, I loop over
>>>>> the
>>>>> words that appear in the query (the query is long in a size of an
>>>>> article)
>>>>> and for every word that appears in each document I take the score from
>>>>> the
>>>>> index mentioned above and calculating a score between the query and
>>>>> each
>>>>> document.
>>>>>
>>>>> Any suggestion how to do it using Lucene search? How to add these
>>>>> values
>>>>> to
>>>>> the searcher?
>>>>>
>>>>>
>>>> Thinking...
>>>>
>>>>
>>>>
>>>>> I looked at the boosting option, but couldn't really see how it helps
>>>>> me
>>>>> to
>>>>> that matter.
>>>>>
>>>>>
>>>> What "boosting option" did you look at?  Can you explain a bit more?
>>>>
>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>

Re: Scores between words. Boosting?

Reply via email to