Hi guys, Mark, Robert, Simon: thanks for the support! I really hope we can work together this summer (and before that, obviously).
According to http://www.google- melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's still some time until the application period. So let me use this week to finish my PhD research plan, and get back to you next week. I am not really familiar with how the program works, i.e. how detailed the application description should be, when mentorship is decided, etc. so I guess we will have a lot to talk about. :) (Actually, should we move this discussion private?) David > Hi David, honestly this sounds fantastic. > > It would be great to have someone to work with us on this issue! > > To date, progress is pretty slow-going (minor improvements, cleanups, > additional stats here and there)... but we really need all the help we > can get, especially from people who have a really good understanding > of the various models. > > In case you are interested, here are some references to discussions > about adding more flexibility (with some prototypes etc): > http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_steps > _towards_making_lucene_s_scoring_more_flexible > https://issues.apache.org/jira/browse/LUCENE-2392 > On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey > > <[email protected]> wrote: > > Hi all, > > > > I have already sent this mail to Simon Willnauer, and he suggested me to > > post it here for discussion. > > > > I am David Nemeskey, a PhD student at the Eotvos Lorand University, > > Budapest, Hungary. I am doing an IR-related research, and we have > > considered using Lucene as our search engine. We were quite satisfied > > with the speed and ease of use. However, we would like to experiment > > with different ranking algorithms, and this is where problems arise. > > Lucene only supports the VSM, and unfortunately the ranking architecture > > seems to be tailored specifically to its needs. > > > > I would be very much interested in revamping the ranking component as a > > GSoC project. The following modifications should be doable in the > > allocated time frame: > > - a new ranking class hierarchy, which is generic enough to allow easy > > implementation of new weighting schemes (at least bag-of-words ones), > > - addition of state-of-the-art ranking methods, such as Okapi BM25, > > proximity and DFR models, > > - configuration for ranking selection, with the old method as default. > > > > I believe all users of Lucene would profit from such a project. It would > > provide the scientific community with an even more useful research aid, > > while regular users could benefit from superior ranking results. > > > > Please let me know your opinion about this proposal. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
