Calculating the total occurrence counts of a term in all of the documents in the 
collection via the TermDocs route is costly if you do it at runtime for a probabilstic 
retrieval model. However, this process could be taken offline and you can create a new 
index which has a Document for each term in the original index and a stored field with 
the occurrence count calculated from the offline process.  This could save you a lot 
of runtime compuatations and also can provide you with capability to store collection 
level  statistics about a term.

- Niranjan

Niranjan Balasubramanian
Software Engineer
Center For Natural Language Processing
(http://cnlp.syr.edu)
Syracuse University

>>> [EMAIL PROTECTED] 8/4/2004 11:34:40 AM >>>
On Aug 4, 2004, at 8:25 AM, ABDOU Samir wrote:
> What about the frequency of any given term in the whole collection!?

IndexReader.docFreq(Term t)

> Calculate this at runtime may affect considerably performance!

It's computed during indexing!  :)

        Erik


>
> Thanks,
>
>
> -----Message d'origine-----
> De : Erik Hatcher [mailto:[EMAIL PROTECTED] 
> Envoyé : mercredi, 4. août 2004 12:25
> À : Lucene Developers List
> Objet : Re: Term Collection Frequency?
>
> The new term vector feature will give you this exact information for a
> particular document or field.
>
>       Erik
>
>
> On Aug 4, 2004, at 3:59 AM, ABDOU Samir wrote:
>
>> Hi,
>>
>> In order to implement a new search model within Lucene 
>> (probabilistic),
>> I need a collection frequency of each term (the number of occurrences
>> of
>> a term within a collection). So, what would be the best way to
>> implement
>> this?
>>
>> Any suggestions, ideas... are welcome.
>>
>> Thanks,
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED] 
>> For additional commands, e-mail: [EMAIL PROTECTED] 
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED] 
> For additional commands, e-mail: [EMAIL PROTECTED] 
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED] 
> For additional commands, e-mail: [EMAIL PROTECTED] 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to