Hello Steven, I looked up the paper and read the relevant part. The text quote you provided is from the introcution. I belief that quote referes to the basic purpose of an information retrieval system in general. At least to the purpose of a vector space model IR system.
If this is the theoretical justfication of the coord_q_d normalisation than it is actually replicating the the other part of the scoring formula to some degree. The entire forumla is actually concerned with this - comparing the term frequencies of query and document. Is there any other paper that actually shows the benefit of doing this particular normalisation with coord_q_d? I am not suggesting here that it is not useful, I am just looking for evidence how the idea developed. Karl -------- Original-Nachricht -------- Datum: Tue, 12 Dec 2006 10:01:05 -0500 Von: Steven Rowe <[EMAIL PROTECTED]> An: java-user@lucene.apache.org Betreff: Re: Lucene scoring: coord_q_d factor > Karl Koch wrote: > > The coord(q,d) normalisation is "a score factor based on how many of > > the query terms are found in the specified document." and described > > here: > > > > > http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord > > > > Does this have a theoretical base? On what basis was the decition > > make to have it? Does anybody know a paper (in Information Retrieval, > > Information Seeking, etc.) or other more general information about > > this? > > Following is quoted from: Krovetz, R. & Croft, W. B. (1992) Lexical > Ambiguity and Information Retrieval. ACM Transactions on Information > Systems, 10(2): 115-141. > > Many retrieval systems represent documents and queries > by the words they contain, and base the comparison on > the number of words they have in common. The more > words the query and document have in common, the > higher the document is ranked; this is referred to as > a "coordination match." Performance is improved by > weighting query and document words using frequency > information from the collection and individual > document texts [27]. > > 27. Salton, G. & McGill, M. Introduction to Modern Information > Retrieval. McGraw-Hill, New York, 1983. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]