18 apr 2006 kl. 11.45 skrev Danilo Cicognani:
Following is the code we are using now: we was considering the
possiblity to
have more informations from Lucene (for example the maximum term
frequency
in one document) to optimized the calculations.
The first method is the one that start the calculation of Tf/Idf
using the
class TTfIdf whose constructor is reported below.
for(int i=0;i<l;i++){ // CAN BE OPTIMIZED IN SOME WAY?
if(freqs[i]>maxfreq) maxfreq=freqs[i];
}
this.freqs=new double[l];
double tf;
double idf;
for(int i=0;i<l;i++){ // CAN BE OPTIMIZED IN SOME WAY?
tf=(double)freqs[i]/(double)maxfreq;
idf=Math.log((double)docs/(double)df[i]);
this.freqs[i]=tf*idf;
}
Not quite sure what you do above, but I guess you could caclulate the
information at index time. To persist it in the index, extend/hack
TermFreqVector and related IO-classes.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]