RE: TermDoc to TermDocsEnum

Burton-West, Tom Wed, 23 Mar 2011 11:26:13 -0700

Hi,

If I understand correctly what you are trying to do as far as getting corpusTF, 
you might want to look at the implementation of the "-t" flag in  
org.apache.lucene.misc/HighFreqTerms.java in contib.


Take a look at the getTotalTermFreq method in trunk.


http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/misc/src/java/org/apache/lucene/misc/HighFreqTerms.java?view=markup

3.x version here:

http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/misc/src/java/org/apache/lucene/misc/HighFreqTerms.java?view=markup

Tom
http://www.hathitrust.org/blogs/large-scale-search


-----Original Message-----
From: nitinhardeniya [mailto:[email protected]] 
Sent: Tuesday, March 22, 2011 1:57 PM
To: [email protected]
Subject: TermDoc to TermDocsEnum

hi

I have a code that work fine with lucene 3.2 where i used TermDocs to find
the corpusTF here is the code 


public void calculateCorpusTF(IndexReader reader) throws IOException {
                // TODO Auto-generated method stub
                Iterator it = word.iterator();

                Iterator  iwp  = word_prop.iterator();
                wordProp wp;
                Term ta = null;

                        TermDocs tds;
        //      DocsEnum tds;
                String text;
                tfDoc tfcoll;  
                long freq=0;
                OpenBitSet skipDocs = null;
                skipDocs = new OpenBitSet(0);
                //System.out.println("Length: "+skipDocs.length());
                try {

                        while(it.hasNext())
                        {       
                                text=it.next();
                                wp=iwp.next();

                                System.out.println("Word is "+text);
                                ta= new Term("content",text);
                                //BytesRef term = new 
BytesRef(text.toCharArray(),0,text.length());

                                tfcoll = new tfDoc();
                                freq=0;

                                                                
tds=reader.termDocs(ta);
                                                        //      
tds=reader.terms("content");
                                                                if(tds!=null)
                                                                {
                                                                        
while(tds.next())
                                                                        {
                                                                        
                                                                                
freq+=tds.freq();
                                                                        //      
System.out.print( text +"  "+ freq);
                                                                        }
                                                                }

                                // New Code -->

//                              tds = reader.termDocsEnum(skipDocs, "content", 
term);
//                              if(tds!=null)
//                              {
//                                      while(true) {
//                                              freq += tds.freq();
//                                              final int docID = tds.nextDoc();
//                                              if (docID == 
DocsEnum.NO_MORE_DOCS) {
//                                                      break;
//                                              }
//                                      }
//                              }
//                              
                                // New code Ends <--

                                tfcoll.tfA=freq;
                                System.out.print( text +"  "+ freq);
                                if(tfcoll.totalTF()==0)
                                {
                                        //System.out.println(" "+tfcoll.tfA+" 
"+tfcoll.tfD+" "+tfcoll.tfC);
                                        System.out.println("Text "+text+ " Freq 
"+freq);
                                }

                                wp.tfColl=tfcoll;
                        }
                }
                catch (Exception e) {
                        // TODO: handle exception
                        e.printStackTrace();
                }

        }

but now i have to use TermDocEnum because i am using lucene dev4.0 which
does not have TermDocs method i was trying to change my code .please refer
to new code [commented ] and tell me how to use this method in a proper way
. if you can provide an example that would be great.
tds = reader.termDocsEnum(skipDocs, "content", term);
I have tried using null at skipdoc because i don't want to skip anything but
it through error

please help

--
View this message in context: 
http://lucene.472066.n3.nabble.com/TermDoc-to-TermDocsEnum-tp2716046p2716046.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: TermDoc to TermDocsEnum

Reply via email to