Karl, On Friday 19 March 2004 18:24, Karl Koch wrote: > Hello group, > > coming back to the discussion about probabilistic and vector space model > (which occured here some time ago), I would like to ask something related. > > I only know the index structure Lucene offers. Does a IR system, based on > the probabilistic model (e.g. Okapi) look different from a VS model? If > yes, why? > > I hope this questions is not too stupid. I am mainly interested because of > some theoretical background... > > Karl
First off: I don't know about the fine points between probabilistic and VS models. Sometime ago I made a quick comparison between the default scoring method of lucene and the okapi model. Of the top of my head I remember this (it is not complete): Similarities: - both do term weighting by inverse document frequency, - both normalize for document length, effectively using term density. - both have a saturation for this term density. Differences: Okapi can also use the document length in by itself. Lucene has a factor (coord) for the overlap between a query and a document (ie. the number of matching query terms present in a document). The term density saturation functions are different, too: Lucene uses square root, okapi uses an (increasing) reciprocal, however in practice the limit if the reciprocal is far from reached. When the overlap is ignored, from a practical view point, I would be surprised if the two methods would order a given set of docs much different for the same query. I'd expect most differences in the 'middle' due to the differences in the form (2nd derivative) of the saturation functions. Coming back to your question: > I only know the index structure Lucene offers. Does a IR system, based on > the probabilistic model (e.g. Okapi) look different from a VS model? If > yes, why? My guess is that, in practice (ie. in the orderings of documents for queries), the two systems are much more similar than different. > I hope this questions is not too stupid. I am mainly interested because of > some theoretical background... Do you intend to do a theoretical comparison of the scoring functions of Lucene and Okapi? AFAIK this has not been investigated. Kind regards, Ype Kingma --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
