Thanks for your hint.
I tried simple solution as following:
Firstly I determine the document type “A” and stored them in an array by
searching the field document type in the index:
public static void doStreamingSearch(final Searcher searcher, Query query)
throws IOException {
Collector streamingHitCollector = new Collector() {
// simply print docId and score of every matching
document
@Override
public void collect(int doc) throws IOException {
c++;
// System.out.println("doc=" + doc);
doc_id.add(doc+"");
// System.out.println("doc=" + doc );
// scorer.score());
}
@Override
public boolean acceptsDocsOutOfOrder() {
return true;
}
@Override
public void setNextReader(IndexReader arg0, int arg1)
throws IOException {
// TODO Auto-generated method stub
}
@Override
public void setScorer(Scorer arg0) throws IOException {
// TODO Auto-generated method stub
}
};
searcher.search(query, streamingHitCollector);
}
Then I modified the HighFrequentTerm in lucene as follows:
while (terms.next()) {
dok.seek(terms);
while (dok.next()) {
for(int i=0;i< doc_id.size();++i)
{
if( doc_id.get(i).equals(dok.doc()+""))
{
if (terms.term().field().equals(field) ) {
tiq.insertWithOverflow(new TermInfo(terms.term(), dok.freq()));
}
}
I could test that i correctly have only the document type „A“. However, the
result is not correct because I can see few terms twice in the ordered high
frequent list.
Any hints where are the problem?
--
View this message in context:
http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3872309.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]