normalized scores

Donna L Gresh Thu, 29 Mar 2007 07:03:52 -0800

Recent questions about whether/how scores are normalized got me wondering 
how
my application (happily) seems to be doing what I want. I have two 
indexes, one
which contains text fields which I want to use as queries into text fields 
in a second index.


I create a Boolean query based on all the terms in a document in my first
index, with all terms added with a SHOULD condition. I then apply this 
query to  my 
second index. I get the hits, and starting from the best hits  I look at 
the score and (arbitrarily) only
report those with a score greater than 0.3. Otherwise I move on to the 
next document
in my first index.

For a given query (for a single input document), the highest score is 
*not* always 1 (which is just how 
I want it). Is this because I am using a Boolean query? Here is my code 
snippet.


                           For the ith document in my input index......
                        TermFreqVector tfv = 
indexReaderOS.getTermFreqVector(i,"required skills");
                        String inputtid = 
indexReaderOS.document(i).getField("inputid").stringValue();
                        if (tfv !=null) {
                                BooleanQuery bq = new BooleanQuery();
                                String[] terms = tfv.getTerms();
                                for (int j=0; j<terms.length; j++) {
                                        String term = terms[j];
                                        Query query = parser.parse(term);
                                        bq.add(query, 
BooleanClause.Occur.SHOULD);
                                }
                                Hits hitsR = isearcherR.search(bq);
 
                                for (int ii=0; ii< hitsR.length(); ii++) {
                                        Document hitRDoc = hitsR.doc(ii);
                                        String hitid = 
hitRDoc.get("empid");
                                        float scoreR = hitsR.score(ii);
 
                                        if (scoreR<0.30) break;
 outfile.println(inputtid+","+empid+","+scoreR);
 
                                }

 

Donna Gresh

normalized scores

Reply via email to