Dear all, My primary research interest is Information retrieval, with a focus on developing effective and robust retrieval models. I am happy to send my first email to Lucene community.
Lucene and nutch are really useful IR systems. But I think that the current retrieval function implemented in Lucene does not perform as well as other state-of-art retrieval functions in terms of effectiveness. I have implemented some state-of-art models (such as pivoted normalization, okapi and axiomatic retrieval models) on top of Lucene, and evaluated these models and the default model implemented in Lucene using standard IR evaluation methodology. Experiments show that the state-of-art retrieval functions outperform the default one. Actually, this is one assignment my advisor and I designed for our IR course. After posting this assignment online, quite a few IR researchers contacted us and asked for the code of our implementations. So, we think that it might be beneficial to everyone in the lucene community and IR research community, if we could contribute our implementation of the state-of-art retrieval functions to Lucene. I think that our contribution could help improve the retrieval performance for both Lucene and nutch. What do you think? Thanks, -Hui