Re: Max Frequency and Tf/Idf

karl wettin Tue, 18 Apr 2006 02:58:23 -0700


18 apr 2006 kl. 11.45 skrev Danilo Cicognani:

Following is the code we are using now: we was considering thepossiblity tohave more informations from Lucene (for example the maximum termfrequency

in one document) to optimized the calculations.

The first method is the one that start the calculation of Tf/Idfusing the

class TTfIdf whose constructor is reported below.

                for(int i=0;i<l;i++){        // CAN BE OPTIMIZED IN SOME WAY?
                        if(freqs[i]>maxfreq) maxfreq=freqs[i];
                }
                this.freqs=new double[l];
                double tf;
                double idf;
                for(int i=0;i<l;i++){        // CAN BE OPTIMIZED IN SOME WAY?
                        tf=(double)freqs[i]/(double)maxfreq;
                        idf=Math.log((double)docs/(double)df[i]);
                        this.freqs[i]=tf*idf;
                }

Not quite sure what you do above, but I guess you could caclulate theinformation at index time. To persist it in the index, extend/hackTermFreqVector and related IO-classes.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Max Frequency and Tf/Idf

Reply via email to