> Well it doesn't since there is not justification of why it is the > way it is. Its like saying, here is that car with 5 weels... enjoy driving.
> > - I think the explanations there would also answer at least some of your > > questions. I hoped it would answer *some* of the questions... (not all) Let me try some more :-) > 2a) Why does Lucene normalise with Cosine Normalisation for the > query? In a range of different IR system variations (as shown in > Chisholm and Kolda's paper in the table on page 8) queries where not > normalised at all. Is there a good reason or perhaps any empirical > evidence that would support this decision? I think "why normalizing at all" is answered in http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_queryNorm ... "queryNorm(q) is a normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied by the same factor), but rather just attempts to make scores from different queries (or even different indexes) comparable. This is a search time factor computed by the Similarity in effect at search time. The default computation in DefaultSimilarity is: queryNorm(q) = 1/sqrt(sumOfSquaredWeights)" I think there were discussions on this normalization. Anyhow I am not the expert to justify this. > 2b) What is the motivation for the normalisation imposed on the > documents (norm_d_t) which I have not seen before in any other > system. Again, does anybody have pointers to literature on this? If I understand what you are asking, the logic is that verrry long documents could virtually contain the entire collection, and therefore should be "punished" for being too long, otherwise they would have an "unfair advantage" upon short documents. Google "search engine document length normalization" for more on this, or see also http://trec.nist.gov/pubs/trec10/papers/JuruAtTrec.pdf > Any answer or partial answer on any of the questions would be > greatly appreciated! I hope this (although partial) helps at all, Doron --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]