[
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113958#comment-13113958
]
hadas raviv commented on LUCENE-2959:
-------------------------------------
Hi,
First of all, I would like to thank you for the great contribution you made by
adding the state of the art ranking methods to lucene. I was waiting for these
features for a long time, since they enable an IR researcher like me to use
lucene, which is a powerful tool, for research purposes.
I downloaded the last version of lucene trunk and played a little with the
models you implemented. There is question I have and I would really appreciate
your answer (my apology in advance - I'm new to lucene so maybe this question
is trivial for you):
I saw that you didn't change the default implementation of lucene for coding
the document length which is used for ranking in language models (one byte for
coding the document length together with boosting). Why did you decide that? Is
it possible to save the "real" document length coded in some other way (maybe
with the new flexible index)? Is there any example for such an implementation?
It is just that I'm concerned with the effect of using an inaccurate document
length on results quality. Did you check this issue?
In addition - do you know about intentions to implement some more advanced
ranking models (such as relevance models, mrf) in the near future?
Thanks in advance,
Hadas
> [GSoC] Implementing State of the Art Ranking for Lucene
> -------------------------------------------------------
>
> Key: LUCENE-2959
> URL: https://issues.apache.org/jira/browse/LUCENE-2959
> Project: Lucene - Java
> Issue Type: New Feature
> Components: core/query/scoring, general/javadocs, modules/examples
> Reporter: David Mark Nemeskey
> Assignee: Robert Muir
> Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: flexscoring branch, 4.0
>
> Attachments: LUCENE-2959.patch, LUCENE-2959.patch,
> LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch,
> implementation_plan.pdf, proposal.pdf
>
>
> Lucene employs the Vector Space Model (VSM) to rank documents, which compares
> unfavorably to state of the art algorithms, such as BM25. Moreover, the
> architecture is
> tailored specically to VSM, which makes the addition of new ranking functions
> a non-
> trivial task.
> This project aims to bring state of the art ranking methods to Lucene and to
> implement a
> query architecture with pluggable ranking functions.
> The wiki page for the project can be found at
> http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]