RE: Repeat Second time: Extract important terms by programming??

Edgar Meij Wed, 22 Mar 2006 14:43:28 -0800

That's relatively easy, but not out-of-the box... 

Something like:


 private TreeMap<Double, String> getTFIDF(String index, int DocumentID, String 
Field ){
      try{
     IndexReader ir = IndexReader.open(index); 
    TermFreqVector tv = ir.getTermFreqVector(DocumentID, Field);
    String[] Termstv=tv.getTerms();
    Double Score;
    TreeMap<Double, String> TfIdfs = new TreeMap<Double, String>();
    int docFreq, N;
    double[] TF = getTermFreqs(tv);
    for (int i =0 ; i < tv.size(); i++){
         docFreq = ir.docFreq(new Term(Field,Termstv[i]));
           N = ir.numDocs() / docFreq;
          Score= Double.valueOf(TF[i] *  ( Math.log(N)/Math.log(2)));
          TfIdfs.put(Score, Termstv[i]);      
    }
    return TfIdfs;

Searching the mailinglist might help as well; 
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200506.mbox/[EMAIL 
PROTECTED] And see also: 
http://www.alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html 


Edgar

> -----Oorspronkelijk bericht-----
> Van: thanh nguyen [mailto:[EMAIL PROTECTED] 
> Verzonden: Wednesday, March 22, 2006 6:31 PM
> Aan: java-user@lucene.apache.org
> Onderwerp: Repeat Second time: Extract important terms by 
> programming??
> 
> Can anyone help me?
> 
> 
> 
>       
> 
> 
>       
>               
> ________________________________________________________
> Bạn có sử dụng Yahoo! không? 
> Hãy xem thử trang chủ Yahoo! Việt Nam! 
> http://vn.yahoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Repeat Second time: Extract important terms by programming??

Reply via email to