Re: Scoring formula - Average number of terms in IDF

kdev Fri, 18 Dec 2009 02:13:23 -0800

The avg is used only in the idf method of the Similarity class. So I guess
there is workaround for what I want to do. Can you give me a reference, on
lucene doc, on how a IndexWriter uses the provided Similarity class?


Thanks again for your time and your help.


Michael McCandless-2 wrote:
> 
> IndexWriter uses Similarity.lengthNorm to create a norm (boost for the
> field, per document) based on the length of the field... it doesn't
> invoke the other methods on Similarity.
> 
> Are you saying you need to know the avg across the whole corpus before
> computing that boost?
> 
> Mike
> 
> On Thu, Dec 17, 2009 at 10:50 AM, kdev <v.verro...@di.uoa.gr> wrote:
>>
>> If I follow your approach, and produce the avg(outside of Lucene) while I
>> 'm
>> building the index(due to performance reasons I can't wait for all the
>> documents to arrive before indexing them) for a collection, the avg will
>> be
>> ready only when all the documents of the collection are indexed.
>> Lucene states that the new similarity class must be set in
>> IndexWriter.setSimilarity(), and be used while I build the index, and in
>> this time the avg isn't ready yet. Is there a way to overcome this? And
>> if
>> not calculating the score while the index is being created, and only when
>> searching the index, what will the consequence in performance be?
>>
>> (Mike thank you about your response)
>>
>>
>> Michael McCandless-2 wrote:
>>>
>>> There have been some discussions, here:
>>>
>>>     https://issues.apache.org/jira/browse/LUCENE-2091
>>>
>>> about how Lucene could track avg field/doc length, but they are just
>>> brainstorming type discussions now.
>>>
>>> You could always do something approximate outside of Lucene?  EG, make
>>> a TokenFilter that counts how many tokens are produced for each
>>> field/doc, aggregate & store that yourself, and use it in your
>>> similarity impl?
>>>
>>> Mike
>>>
>>> On Tue, Dec 15, 2009 at 5:04 AM, kdev <v.verro...@di.uoa.gr> wrote:
>>>>
>>>> any ideas please?
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26792364.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26830145.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Scoring-formula---Average-number-of-terms-in-IDF-tp26282578p26841521.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Scoring formula - Average number of terms in IDF

Reply via email to