That's relatively easy, but not out-of-the box...
Something like:
private TreeMap<Double, String> getTFIDF(String index, int DocumentID, String
Field ){
try{
IndexReader ir = IndexReader.open(index);
TermFreqVector tv = ir.getTermFreqVector(DocumentID, Field);
String[] Termstv=tv.getTerms();
Double Score;
TreeMap<Double, String> TfIdfs = new TreeMap<Double, String>();
int docFreq, N;
double[] TF = getTermFreqs(tv);
for (int i =0 ; i < tv.size(); i++){
docFreq = ir.docFreq(new Term(Field,Termstv[i]));
N = ir.numDocs() / docFreq;
Score= Double.valueOf(TF[i] * ( Math.log(N)/Math.log(2)));
TfIdfs.put(Score, Termstv[i]);
}
return TfIdfs;
Searching the mailinglist might help as well;
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200506.mbox/[EMAIL
PROTECTED] And see also:
http://www.alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html
Edgar
> -----Oorspronkelijk bericht-----
> Van: thanh nguyen [mailto:[EMAIL PROTECTED]
> Verzonden: Wednesday, March 22, 2006 6:31 PM
> Aan: [email protected]
> Onderwerp: Repeat Second time: Extract important terms by
> programming??
>
> Can anyone help me?
>
>
>
>
>
>
>
>
> ________________________________________________________
> Bạn có sử dụng Yahoo! không?
> Hãy xem thử trang chủ Yahoo! Việt Nam!
> http://vn.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]