Doug, Well, my good intentions (to reindex on Thursday night) were interrupted by Hurricaine Isabel (followed by a 44 hour power outtage).
Well, excuses aside, I did get the reindex done today and the scores for all hits from a single date query come out to be the same score (as they should). Don't have any idea what screwed up the previous index - though as promised, I'll keep an eye on it as I continue to merge new stuff over the next few days/weeks. Is there a way, using standard Lucene configuration parameters and/or API's, to force the hit scores to come out so the highest one is set to 1, and the others are proportionately lower? Regards, Terry ----- Original Message ----- From: "Terry Steichen" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Thursday, September 18, 2003 10:10 AM Subject: Re: Lucene Scoring Behavior > Doug, > > I just extracted a portion of the database, reindexed and found the scores > to come out much more like we'd expect. Appears this may be an indexing > issue - I index new stuff each day and merge the new index with the master > index. Only redo the master when I can't avoid it (because it takes so > long). I probably merge 100 times or more before reindexing. This evening > I'll reindex and let you know if the apparent problem clears up. If so, > I'll keep track of it as I continue to merge and see if there's any issue > there. > > Thanks for the input (and from Erik, pointing me to the Explanation - it's > pretty neat). > > Question: The new scores for the test database portion mentioned above all > seem to come out in the range of .06 to .07. I assume this is because they > never get normalized. If this is the case, (a) would it hurt anything to > "normalize up" (so the scores range up to 1), and if so (b) is there an > easy, non-disruptive (to the source code) way to do this? > > Regards, > > Terry > > > ----- Original Message ----- > From: "Doug Cutting" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Wednesday, September 17, 2003 11:15 PM > Subject: Re: Lucene Scoring Behavior > > > > Hmm. This makes no sense to me. Can you supply a reproducible > > standalone test case? > > > > Doug > > > > Terry Steichen wrote: > > > Doug, > > > > > > (1) No, I did *not* boost the pub_date field, either in the indexing > process > > > or in the query itself. > > > > > > (2) And, each pub_date field of each document (which is in XML format) > > > contains only one instance of the date string. > > > > > > (3) And only the pub_date field itself is indexed. There are other > > > attributes of this field that may contain the date string, but they > aren't > > > indexed - that is, they are not included in the instantiated Document > class. > > > > > > Regards, > > > > > > Terry > > > > > > ----- Original Message ----- > > > From: "Doug Cutting" <[EMAIL PROTECTED]> > > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > > Sent: Wednesday, September 17, 2003 5:51 PM > > > Subject: Re: Lucene Scoring Behavior > > > > > > > > > > > >>Terry Steichen wrote: > > >> > > >>> 0.03125 = fieldNorm(field=pub_date, doc=90992) > > >>> 1.0 = fieldNorm(field=pub_date, doc=90970) > > >> > > >>It looks like the fieldNorm's are what differ, not the IDFs. These are > > >>the product of the document and/or field boost, and 1/sqrt(numTerms) > > >>where numTerms is the number of terms in the "pub_date" field of the > > >>document. Thus if each document is only assigned one date, and you > > >>didn't boost the field or the document when you indexed it, this should > > >>be 1.0. But if the document has two dates, then this would be > > >>1/sqrt(2). Or if you boosted this document pub_date field, then this > > >>will have whatever boost you provided. > > >> > > >>So, did you boost anything when indexing? Or could a single document > > >>have two or more different values for pub_date? Either would explain > > > > > > this. > > > > > >>Doug > > >> > > >> > > >>--------------------------------------------------------------------- > > >>To unsubscribe, e-mail: [EMAIL PROTECTED] > > >>For additional commands, e-mail: [EMAIL PROTECTED] > > >> > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
