Re: conditional High Freq Terms in Lucene index

starz10de Fri, 30 Mar 2012 16:27:03 -0700

Thanks for your hint.

I tried simple solution as following:
Firstly I determine the document type “A” and stored them in an array by
searching the field document type in the index:
public static void doStreamingSearch(final Searcher searcher, Query query)
                        throws IOException {
                
                
                Collector streamingHitCollector = new Collector() { 
                        // simply print docId and score of every matching 
document
                        @Override
                        public void collect(int doc) throws IOException {
                                c++;
                        //      System.out.println("doc=" + doc);
                                
                                doc_id.add(doc+"");
                                //  System.out.println("doc=" + doc  );
                                // scorer.score());
                        }


                        @Override
                        public boolean acceptsDocsOutOfOrder() {
                                return true;
                        }

                        @Override
                        public void setNextReader(IndexReader arg0, int arg1)
                                        throws IOException {
                                // TODO Auto-generated method stub
                                
                        }

                        @Override
                        public void setScorer(Scorer arg0) throws IOException {
                                // TODO Auto-generated method stub
                                
                        } 

                };

                 searcher.search(query, streamingHitCollector); 
                 
        }
Then I modified the HighFrequentTerm in lucene as follows:
while (terms.next()) { 
          
      dok.seek(terms);
         
        while (dok.next()) {  
                 
                
       
                  for(int i=0;i< doc_id.size();++i)
                         { 
                 
                    if( doc_id.get(i).equals(dok.doc()+""))
                    {
                         if (terms.term().field().equals(field)  ) {
                                                                  
tiq.insertWithOverflow(new TermInfo(terms.term(), dok.freq()));
                                }
            
                    }
I could test that i correctly have only the document type „A“. However, the
result is not correct because I can see few terms twice in the ordered high
frequent list.

Any hints where are the problem?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3872309.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: conditional High Freq Terms in Lucene index

Reply via email to