On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey <[email protected]> wrote: > Hi all, > > I have already sent this mail to Simon Willnauer, and he suggested me to post > it here for discussion. > > I am David Nemeskey, a PhD student at the Eotvos Lorand University, Budapest, > Hungary. I am doing an IR-related research, and we have considered using > Lucene as our search engine. We were quite satisfied with the speed and ease > of > use. However, we would like to experiment with different ranking algorithms, > and this is where problems arise. Lucene only supports the VSM, and > unfortunately the ranking architecture seems to be tailored specifically to > its > needs. > > I would be very much interested in revamping the ranking component as a GSoC > project. The following modifications should be doable in the allocated time > frame: > - a new ranking class hierarchy, which is generic enough to allow easy > implementation of new weighting schemes (at least bag-of-words ones), > - addition of state-of-the-art ranking methods, such as Okapi BM25, proximity > and DFR models, > - configuration for ranking selection, with the old method as default. > > I believe all users of Lucene would profit from such a project. It would > provide the scientific community with an even more useful research aid, while > regular users could benefit from superior ranking results. > > Please let me know your opinion about this proposal. >
Hi David, honestly this sounds fantastic. It would be great to have someone to work with us on this issue! To date, progress is pretty slow-going (minor improvements, cleanups, additional stats here and there)... but we really need all the help we can get, especially from people who have a really good understanding of the various models. In case you are interested, here are some references to discussions about adding more flexibility (with some prototypes etc): http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_steps_towards_making_lucene_s_scoring_more_flexible https://issues.apache.org/jira/browse/LUCENE-2392 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
