Hello,
I want to use Lucene to get similar documents based on a Boolean Query
(similar metadata with OR clauses) and ratings of the user for already
searched documents.
I intend to implement a Naive Bayes classifier to categorize documents
into liked/disliked classes and would do this by using a HitCollector class.
class ClassifyingHitCollector implements HitCollector {
public void collect(int doc, float score) {
// classify document
// if document is liked -> add to hit collection
}
}
...
ClassifyingHitCollector c = new ClassifyingHitCollector ();
searcher.search(query, c);
This means that the calculation of the bayes classification has to be
calculated for each matching document. Is there a possibility to do this
(during search) for only the n top matching documents or does this mean
to use the Hits returning searcher.search(..) overload and do the
calculation on the n top matching documents, after the Lucene search?
Is there another possibility to change the scoring of the search(..)
method that is more efficient?
TIA,
Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]