[jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene

Hui Fang (JIRA) Thu, 20 Aug 2009 13:08:49 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745608#action_12745608
 ]


Hui Fang commented on LUCENE-965:
---------------------------------

Hello everyone, 

We have re-implemented the retrieval functions in a very different way. The 
main differences are (1) the average document length will not be computed in 
the retrieval process as we did the previous implementation, which could make 
the retrieval process more efficiently and (2) instead of modifying the 
existing search related classes, we integrate the new retrieval functions 
through two new classes, i.e., AXTermQuery and. AXTermScorer by extending 
TermQuery and TermScorer classes.  I think that the current implementation 
addresses most concerns raised in this discussion threads. 

The source codes and the updated reports of our implementation is now available 
at http://www.ece.udel.edu/~hfang/LuceneAX.html.   We have implemented two 
slightly versions for lucene-2.4.1 and lucene-2.9-dev.   We hope that the 
implementation of the axiomatic retrieval function could be integrated in the 
releases of the Lucene.    Please feel free to let me know if you have any 
questions or comments. 

Thanks,
-Hui 

> Implement a state-of-the-art retrieval function in Lucene
> ---------------------------------------------------------
>
>                 Key: LUCENE-965
>                 URL: https://issues.apache.org/jira/browse/LUCENE-965
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Hui Fang
>             Fix For: 3.0
>
>         Attachments: axiomaticFunction.patch
>
>
> We implemented the axiomatic retrieval function, which is a state-of-the-art 
> retrieval function, to 
> replace the default similarity function in Lucene. We compared the 
> performance of these two functions and reported the results at 
> http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf. 
> The report shows that the performance of the axiomatic retrieval function is 
> much better than the default function. The axiomatic retrieval function is 
> able to find more relevant documents and users can see more relevant 
> documents in the top-ranked documents. Incorporating such a state-of-the-art 
> retrieval function could improve the search performance of all the 
> applications which were built upon Lucene. 
> Most changes related to the implementation are made in AXSimilarity, 
> TermScorer and TermQuery.java.  However, many test cases are hand coded to 
> test whether the implementation of the default function is correct. Thus, I 
> also made the modification to many test files to make the new retrieval 
> function pass those cases. In fact, we found that some old test cases are not 
> reasonable. For example, in the testQueries02 of TestBoolean2.java, 
> the query is "+w3 xx", and we have two documents "w1 xx w2 yy w3" and "w1 w3 
> xx w2 yy w3". 
> The second document should be more relevant than the first one, because it 
> has more 
> occurrences of the query term "w3". But the original test case would require 
> us to rank 
> the first document higher than the second one, which is not reasonable. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene

Reply via email to