I would like to search a collection of "keyword"s with lucene.
A Document has one or many keywords. The keywords appear only once in a
document. (tf = 1)
for example:
Document_1 : ( "aa" "bb" "cc" )
Document_2 : ( "bb" "cc" )
Document_3 : ( "cc" "dd" )
Document_4 : ( "aa" "cc" "dd" )
I have a query from more terms with different boost. The coord(int overlap,
int maxOverlap) is turn off. i.e. always return 1.0.
query = "aa^0.1 bb^0.9 xx^0.1 yy^0.1 zz^0.1"
the query may contain many terms which do not appear in a Document. i.e.
"xx" "yy" and "zz" here.
Amd I got
3 hits
Document_2 : ( "bb" "cc" ) : score : 0.75391763
Document_1 : ( "aa" "bb" "cc" ) : score : 0.67014897
Document_4 : ( "aa" "cc" "dd" ) : score : 0.0670149
[Question] is...why Document_2 better than Document_1 !?
Document_1 does match two terms; "aa" and "bb".
I want to emphasize the "match" and less care the "mismatch".
How should I modify Similarity to achieve that? (Document_1 should get
higher score!)
Is there any suggestion or example to implement such "keyword collection"
searching?
For the query above,
I actually use BooleanQuery with TermQuery. What else should I take care of?
/* ************************************************** */
BooleanQuery q = new BooleanQuery(true); // disable coord
TermQuery tq;
{
tq = new TermQuery(new Term(field, "aa"));
tq.setBoost(.1f);
q.add(tq, BooleanClause.Occur.SHOULD);
}
{
tq = new TermQuery(new Term(field, "bb"));
tq.setBoost(.9f);
q.add(tq, BooleanClause.Occur.SHOULD);
}
{
tq = new TermQuery(new Term(field, "xx"));
tq.setBoost(.1f);
q.add(tq, BooleanClause.Occur.SHOULD);
}
....
Hits hits = isearcher.search(q);
/* ************************************************** */
Thanks
Lin