On Feb 2, 2011, at 4:10 AM, David Nemeskey wrote: > Hi guys, > > Mark, Robert, Simon: thanks for the support! I really hope we can work > together this summer (and before that, obviously).
Sounds like a great idea. Looking forward to the proposal. > > According to http://www.google- > melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's > still some time until the application period. So let me use this week to > finish > my PhD research plan, and get back to you next week. > > I am not really familiar with how the program works, i.e. how detailed the > application description should be, when mentorship is decided, etc. so I > guess > we will have a lot to talk about. :) It's pretty competitive, especially since you are not only competing against others for Lucene slots, but you are competing against other ASF projects. I highly recommend you, as well as interested mentors, look through Mahout's past GSOC projects: http://www.lucidimagination.com/search/?q=GSOC#/p:mahout and http://www.lucidimagination.com/search/document/2acd6fd380feec3/thoughts_on_gsoc and https://cwiki.apache.org/confluence/display/MAHOUT/GSOC > > (Actually, should we move this discussion private?) No, you shouldn't and it would be to your detriment come the ranking process since people won't have a track record of what you've done as it relates to your proposal. The goal of GSOC is to learn how Open Source works. Even though you have a mentor, that person is there to help you navigate the community, not to be a private tutor on technical details. I routinely tell all my students that I will help them w/ personal issues (vacation, emergencies, etc.) but that all technical stuff must be done on list (JIRA, IRC, dev@, patches, etc.) > > David > >> Hi David, honestly this sounds fantastic. >> >> It would be great to have someone to work with us on this issue! >> >> To date, progress is pretty slow-going (minor improvements, cleanups, >> additional stats here and there)... but we really need all the help we >> can get, especially from people who have a really good understanding >> of the various models. >> >> In case you are interested, here are some references to discussions >> about adding more flexibility (with some prototypes etc): >> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_steps >> _towards_making_lucene_s_scoring_more_flexible >> https://issues.apache.org/jira/browse/LUCENE-2392 > >> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey >> >> <[email protected]> wrote: >>> Hi all, >>> >>> I have already sent this mail to Simon Willnauer, and he suggested me to >>> post it here for discussion. >>> >>> I am David Nemeskey, a PhD student at the Eotvos Lorand University, >>> Budapest, Hungary. I am doing an IR-related research, and we have >>> considered using Lucene as our search engine. We were quite satisfied >>> with the speed and ease of use. However, we would like to experiment >>> with different ranking algorithms, and this is where problems arise. >>> Lucene only supports the VSM, and unfortunately the ranking architecture >>> seems to be tailored specifically to its needs. >>> >>> I would be very much interested in revamping the ranking component as a >>> GSoC project. The following modifications should be doable in the >>> allocated time frame: >>> - a new ranking class hierarchy, which is generic enough to allow easy >>> implementation of new weighting schemes (at least bag-of-words ones), >>> - addition of state-of-the-art ranking methods, such as Okapi BM25, >>> proximity and DFR models, >>> - configuration for ranking selection, with the old method as default. >>> >>> I believe all users of Lucene would profit from such a project. It would >>> provide the scientific community with an even more useful research aid, >>> while regular users could benefit from superior ranking results. >>> >>> Please let me know your opinion about this proposal. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
