docFreq takes long time to execute in a multiple index environment

tierecke Sun, 05 Aug 2007 16:41:03 -0700

Hi there,

I have my 25 indexes of 1.8GB each read with MultiReader.
I try to get the document frequency of all the terms in specific documents
and it takes quite a long time - a document with 1000 terms takes around
4:30 minutes to calculate all the document frequencies of its terms - and
there are longer documents than that.


Since I have quite a lot of documents to process (around 12000) - it'll take
forever.
My function of getting the document frequency is listed below (it's for one
single term - but it's called for all the terms in the document term vector.

    public int getdocumentfrequency (String termstr) throws Exception
    {
        Term term=new Term("contents", termstr);
        TermEnum termenum=multireader.terms(term);
        int freq=termenum.docFreq();
        return freq;
    }

Is there a better (i.e. faster) way to get all the document frequencies of a
specific document?

thanks a lot,
Nir.

-- 
View this message in context: 
http://www.nabble.com/docFreq-takes-long-time-to-execute-in-a-multiple-index-environment-tf4221604.html#a12009334
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

docFreq takes long time to execute in a multiple index environment

Reply via email to