Recent questions about whether/how scores are normalized got me wondering
how
my application (happily) seems to be doing what I want. I have two
indexes, one
which contains text fields which I want to use as queries into text fields
in a second index.
I create a Boolean query based on all the terms in a document in my first
index, with all terms added with a SHOULD condition. I then apply this
query to my
second index. I get the hits, and starting from the best hits I look at
the score and (arbitrarily) only
report those with a score greater than 0.3. Otherwise I move on to the
next document
in my first index.
For a given query (for a single input document), the highest score is
*not* always 1 (which is just how
I want it). Is this because I am using a Boolean query? Here is my code
snippet.
For the ith document in my input index......
TermFreqVector tfv =
indexReaderOS.getTermFreqVector(i,"required skills");
String inputtid =
indexReaderOS.document(i).getField("inputid").stringValue();
if (tfv !=null) {
BooleanQuery bq = new BooleanQuery();
String[] terms = tfv.getTerms();
for (int j=0; j<terms.length; j++) {
String term = terms[j];
Query query = parser.parse(term);
bq.add(query,
BooleanClause.Occur.SHOULD);
}
Hits hitsR = isearcherR.search(bq);
for (int ii=0; ii< hitsR.length(); ii++) {
Document hitRDoc = hitsR.doc(ii);
String hitid =
hitRDoc.get("empid");
float scoreR = hitsR.score(ii);
if (scoreR<0.30) break;
outfile.println(inputtid+","+empid+","+scoreR);
}
Donna Gresh