Doug/Erik,
I do use RangeQuery to get a range of dates, but in this case I'm just
getting a single date (string), so I believe it's just a regular query I'm
using.
Per Erik's suggestion, I checked out the Explanation for some of these
anomolies. I've included a condensation of the data it generated below
(which I don't frankly don't understand). Perhaps that will give you or
Erik some insight into what's happening?
Regards,
Terry
PS: I note that the 'docFreq' parameters displayed below correspond exactly
to the number of hits for the query. Also, here's the Similarity class I'm
using (per an earlier suggestion of Doug):
public class WESimilarity2 extends
org.apache.lucene.search.DefaultSimilarity {
public float lengthNorm(String fieldName, int numTerms) {
if (fieldName.equals("headline") || fieldName.equals("summary") ||
fieldName.equals("ssummary")){
return 4.0f * super.lengthNorm(fieldName, Math.max(numTerms,750));
} else {
return super.lengthNorm(fieldName, Math.max(numTerms, 750));
}
}
}
Query #1: pub_date:20030917
All items: Score: .23000652
0.23000652 = weight(pub_date:20030917 in 91197), product of:
0.99999994 = queryWeight(pub_date:20030917), product of:
7.360209 = idf(docFreq=157)
0.1358657 = queryNorm
0.23000653 = fieldWeight(pub_date:20030917 in 91197), product of:
1.0 = tf(termFreq(pub_date:20030917)=1)
7.360209 = idf(docFreq=157)
0.03125 = fieldNorm(field=pub_date, doc=91197)
Query #2: pub_date:20030916
All items: Score: .22295427
0.22295427 = fieldWeight(pub_date:20030916 in 90992), product of:
1.0 = tf(termFreq(pub_date:20030916)=1)
7.1345367 = idf(docFreq=197)
0.03125 = fieldNorm(field=pub_date, doc=90992)
Query #3: pub_date:20030915
Items 1&2: Score: 1.0
7.2580175 = weight(pub_date:20030915 in 90970), product of:
0.99999994 = queryWeight(pub_date:20030915), product of:
7.258018 = idf(docFreq=174)
0.13777865 = queryNorm
7.258018 = fieldWeight(pub_date:20030915 in 90970), product of:
1.0 = tf(termFreq(pub_date:20030915)=1)
7.258018 = idf(docFreq=174)
1.0 = fieldNorm(field=pub_date, doc=90970)
Query #3 (same as above): pub_date:20030915
Other items: Score: 03125
0.22681305 = weight(pub_date:20030915 in 90826), product of:
0.99999994 = queryWeight(pub_date:20030915), product of:
7.258018 = idf(docFreq=174)
0.13777865 = queryNorm
0.22681306 = fieldWeight(pub_date:20030915 in 90826), product of:
1.0 = tf(termFreq(pub_date:20030915)=1)
7.258018 = idf(docFreq=174)
0.03125 = fieldNorm(field=pub_date, doc=90826)
Query #4: pub_date:20030914
0.21384604 = weight(pub_date:20030914 in 90417), product of:
0.99999994 = queryWeight(pub_date:20030914), product of:
6.843074 = idf(docFreq=264)
0.14613315 = queryNorm
0.21384606 = fieldWeight(pub_date:20030914 in 90417), product of:
1.0 = tf(termFreq(pub_date:20030914)=1)
6.843074 = idf(docFreq=264)
0.03125 = fieldNorm(field=pub_date, doc=90417)
Query #5: pub_date 20030913
Items 1&2: Score: 1.0
7.366558 = fieldWeight(pub_date:20030913 in 90591), product of:
1.0 = tf(termFreq(pub_date:20030913)=1)
7.366558 = idf(docFreq=156)
1.0 = fieldNorm(field=pub_date, doc=90591)
Query #5 (same as above): pub_date:20030913
Other items: Score: .03125
0.23020494 = fieldWeight(pub_date:20030913 in 90383), product of:
1.0 = tf(termFreq(pub_date:20030913)=1)
7.366558 = idf(docFreq=156)
0.03125 = fieldNorm(field=pub_date, doc=90383)
----- Original Message -----
From: "Doug Cutting" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, September 17, 2003 4:55 PM
Subject: Re: Lucene Scoring Behavior
> If you're using RangeQuery to do date searching, then you'll likely see
> unusual scoring. The IDF of a date, like any other term, is inversely
> related to the number of documents with that date. So documents whose
> dates are rare will score higher, which is probably not what you intend.
>
> Using a Filter for date searching is one way to remove dates from the
> scoring calculation. Another is to provide a Similarity implementation
> that gives an IDF of 1.0 for terms from your date field, e.g., something
> like:
>
> public class MySimilarity extends DefaultSimilarity {
> public float idf(Term term, Searcher searcher) throws IOException {
> if (term.field() == "date") {
> return 1.0f;
> } else {
> return super.idf(term, searcher);
> }
> }
> }
>
> Or you could just give date clauses of your query a very small boost
> (e.g., .0001) so that other clauses dominate the scoring.
>
> Doug
>
> Terry Steichen wrote:
> > I've run across some puzzling behavior regarding scoring. I have a set
of documents which contain, among others, a date field (whose contents is a
string in the YYYYMMDD format). When I query on the date 20030917 (that is,
today), I get 157 hits, all of which have a score of .23000652. If I use
20030916 (yesterday), I get 197 hits, each of which has a score of
.22295427.
> >
> > So far, all seems logical. However, when I search for all records for
the date 20030915, the first two (of 174 hits) have a score of 1.0, while
all the rest of the hits have a score of .03125. Here is a tabulation of
these and a few more queries:
> >
> > Query Date Result
> > ======= ========================
> > 20030917 all have a score of .23000652 (157)
> > 20030916 all have a score of .22295427 (197)
> > 20030915 first 2 have a 1.0 score, all rest are .03125 (174)
> > 20030914 all have a score of .21384604 (264)
> > 20030913 first 2 have a 1.0 score, all rest are .03125 (156)
> > 20030912 all have a score .2166833 (241)
> > 20030911 first 3 have a 1.0 score, all rest are .03125 (244)
> > 20030910 all have a score of .2208193 (211)
> >
> > I would expect that all the hits would have the same score, and I would
expect it to be normalized to 1 (unless, I guess, the top score was less
than 1, in which case normalization presumably doesn't occur).
> >
> > Does anyone have any ideas as to what might be going on here? (I'm
using the latest CVS sources, obtained this afternoon.)
> >
> > Regards,
> >
> > Terry
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]