[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

hadas raviv (JIRA) Sat, 24 Sep 2011 05:35:54 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113958#comment-13113958
 ]


hadas raviv commented on LUCENE-2959:
-------------------------------------

Hi,

First of all, I would like to thank you for the great contribution you made by 
adding the state of the art ranking methods to lucene. I was waiting for these  
features for a long time, since they enable an IR researcher like me to use 
lucene, which is a powerful tool, for research purposes.

I downloaded the last version of lucene trunk and played a little with the 
models you implemented. There is question I have and I would really appreciate 
your answer (my apology in advance - I'm new to lucene so maybe this question 
is trivial for you):

I saw that you didn't change the default implementation of lucene for coding 
the document length which is used for ranking in language models (one byte for 
coding the document length together with boosting). Why did you decide that? Is 
it possible to save the "real" document length coded in some other way (maybe 
with the new flexible index)? Is there any example for such an implementation? 
It is just that I'm concerned with the effect of using an inaccurate document 
length on results quality. Did you check this issue?

In addition - do you know about intentions to implement some more advanced 
ranking models (such as relevance models, mrf) in the near future?

Thanks in advance,
Hadas

> [GSoC] Implementing State of the Art Ranking for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-2959
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2959
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/query/scoring, general/javadocs, modules/examples
>            Reporter: David Mark Nemeskey
>            Assignee: Robert Muir
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: flexscoring branch, 4.0
>
>         Attachments: LUCENE-2959.patch, LUCENE-2959.patch, 
> LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, 
> implementation_plan.pdf, proposal.pdf
>
>
> Lucene employs the Vector Space Model (VSM) to rank documents, which compares
> unfavorably to state of the art algorithms, such as BM25. Moreover, the 
> architecture is
> tailored specically to VSM, which makes the addition of new ranking functions 
> a non-
> trivial task.
> This project aims to bring state of the art ranking methods to Lucene and to 
> implement a
> query architecture with pluggable ranking functions.
> The wiki page for the project can be found at 
> http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

Reply via email to