Re: conditional High Freq Terms in Lucene index

starz10de Sat, 31 Mar 2012 05:56:36 -0700

I revised it including your comment:



                        private Scorer scorer;
                        private int docBase;
                        
                        // simply print docId and score of every matching 
document
                        @Override
                        public void collect(int doc) throws IOException {

String k=doc+"";
String k1=docBase+"";

                                
                                  doc_ids.add(k+k1);
                
                     
                        
                        }

                        @Override
                        public boolean acceptsDocsOutOfOrder() {
                          return true;
                        }

                        @Override
                        public void setNextReader(IndexReader reader, int 
docBase)
                            throws IOException {
                          this.docBase = docBase;
                        }

                        @Override
                        public void setScorer(Scorer scorer) throws IOException 
{
                          this.scorer = scorer;
                        }
                        
                      
        I could see in the highFrequentTerm that the condition for the document
type "A" is performed. However, the highFrequent term isnot computed
correctly, I still see duplicate term in the list beside wrong occuerence.

here how I do it:

TermInfoQueue tiq = new TermInfoQueue(numTerms);
    TermEnum terms = reader.terms();
    TermDocs dok =null; 
    int k=0;
    dok = reader.termDocs(); 
    if (field != null) { 
      while (terms.next()) { 
          
        
          k=0;
      
      dok.seek(terms);
         
        while (dok.next()) {  
                 
            
           
                //System.out.println(dok.doc());
                  for(int i=0;i< doc_ids.size();++i)
                         {  

                   
if(categorization_based_on_year.doc_ids.get(i).equals(dok.doc()+""))
                    {

// here I can see that only doc ids for the type "A" is printed

System.out.println(dok.doc());

                         if (terms.term().field().equals(field)   ) {
                       tiq.insertWithOverflow(new TermInfo(terms.term(),
dok.freq()));
                                }
                         
               i=10000;
                    }
                 
                 }   
.
.
.

any hint ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3873362.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: conditional High Freq Terms in Lucene index

Reply via email to