I am disatisfied with the document scores that I'm getting. If a document is short,
and has one occurrence of the search term, it is ranked higher than a longer document
with two occurrences of the term. This makes little sense to me, and I'd like the
longer document with more occurrences to be ranked higher. I figured I have to
override the scoring method, but I can't find where Lucene actually does the scoring.
This is actually not an uncommon problem for me, as I find perusing the API to be high
on the confusing scale, due to the lack of comprehensive Javadoc documentation.
(Something that even Sun doesn't spend much time on.) I attempt to read the code, but
variable names are terse, and there's a dearth of commenting, which makes it fairly
unfathomable.
This is the code that I'm using. Am I doing the right thing in using the Query object,
or should I be using a different one, such as TermQuery ? Does TermQuery score
differently, so that I might be happier with it's behavior ? If not, where might I
find the method that actually computes the Document's score, so that I may modify it ?
Hits find ( String string_searchString, String string_indexPath )
{
Searcher indexSearcher ;
Analyzer analyzer ;
Query query ;
QueryParser queryParser ;
Hits searchResults_Hits ;
try
{
indexSearcher = new IndexSearcher ( string_indexPath ) ;
analyzer = new SimpleAnalyzer () ;
query = QueryParser.parse ( string_searchString,
"DocumentText", analyzer ) ;
searchResults_Hits = indexSearcher.search ( query ) ;
return searchResults_Hits ;
}