I revised it including your comment:
private Scorer scorer;
private int docBase;
// simply print docId and score of every matching
document
@Override
public void collect(int doc) throws IOException {
String k=doc+"";
String k1=docBase+"";
doc_ids.add(k+k1);
}
@Override
public boolean acceptsDocsOutOfOrder() {
return true;
}
@Override
public void setNextReader(IndexReader reader, int
docBase)
throws IOException {
this.docBase = docBase;
}
@Override
public void setScorer(Scorer scorer) throws IOException
{
this.scorer = scorer;
}
I could see in the highFrequentTerm that the condition for the document
type "A" is performed. However, the highFrequent term isnot computed
correctly, I still see duplicate term in the list beside wrong occuerence.
here how I do it:
TermInfoQueue tiq = new TermInfoQueue(numTerms);
TermEnum terms = reader.terms();
TermDocs dok =null;
int k=0;
dok = reader.termDocs();
if (field != null) {
while (terms.next()) {
k=0;
dok.seek(terms);
while (dok.next()) {
//System.out.println(dok.doc());
for(int i=0;i< doc_ids.size();++i)
{
if(categorization_based_on_year.doc_ids.get(i).equals(dok.doc()+""))
{
// here I can see that only doc ids for the type "A" is printed
System.out.println(dok.doc());
if (terms.term().field().equals(field) ) {
tiq.insertWithOverflow(new TermInfo(terms.term(),
dok.freq()));
}
i=10000;
}
}
.
.
.
any hint ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/conditional-High-Freq-Terms-in-Lucene-index-tp3868066p3873362.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]