[ https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515783 ]
Charlie Zhao commented on LUCENE-965: ------------------------------------- Hello Hui: Thank you for contributing your axiomatic retrieval function to Lucene. Can not wait for the test drive. Would you please report your settings for your experiment on Collection Function MAP P5 P10 P20 P100 NumRR ROBUST04 Lucene Default 0.048 0.12 0.09 0.08 0.05 21 Since there are disparities comparing with mine. num_q 249 num_ret 239436 num_rel 17412 num_rel_ret 9780 map 0.2076 gm_ap 0.1049 R-prec 0.2551 bpref 0.2189 recip_rank 0.5684 ircl_prn.0.00 0.6288 ircl_prn.0.10 0.4459 ircl_prn.0.20 0.3562 ircl_prn.0.30 0.2864 ircl_prn.0.40 0.2289 ircl_prn.0.50 0.1925 ircl_prn.0.60 0.145 ircl_prn.0.70 0.1062 ircl_prn.0.80 0.0702 ircl_prn.0.90 0.0461 ircl_prn.1.00 0.0261 P5 0.3944 P10 0.3598 P15 0.3307 P20 0.307 P30 0.2657 P100 0.1618 P200 0.1117 P500 0.0635 P1000 0.0393 Before we go further, let us make sure we are in the same page. Here is my setting: Data: TREC Disk 4 & 5; 528,155 documents; 1,904 MB of text. Query Number: TREC Query Number 301-700 Query Field: <title> only IR Engine: Lucene 2.0 (need double check, but close:) Note: default Lucene similarity function, using title words only. If we are in the same page, then 0.048 MAP score is terribly low for 301-700, whereas 0.2076 in mine. Still your axiomatic retrieval function outperformed the default in many other aspects. So if you would like to check your experimental setting, and if my result is more closer to the real default, then we might discover a further improvement with the axiomatic retrieval function. That is my hope. Charlie Zhao > Implement a state-of-the-art retrieval function in Lucene > --------------------------------------------------------- > > Key: LUCENE-965 > URL: https://issues.apache.org/jira/browse/LUCENE-965 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.2 > Reporter: Hui Fang > Attachments: axiomaticFunction.patch > > > We implemented the axiomatic retrieval function, which is a state-of-the-art > retrieval function, to > replace the default similarity function in Lucene. We compared the > performance of these two functions and reported the results at > http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf. > The report shows that the performance of the axiomatic retrieval function is > much better than the default function. The axiomatic retrieval function is > able to find more relevant documents and users can see more relevant > documents in the top-ranked documents. Incorporating such a state-of-the-art > retrieval function could improve the search performance of all the > applications which were built upon Lucene. > Most changes related to the implementation are made in AXSimilarity, > TermScorer and TermQuery.java. However, many test cases are hand coded to > test whether the implementation of the default function is correct. Thus, I > also made the modification to many test files to make the new retrieval > function pass those cases. In fact, we found that some old test cases are not > reasonable. For example, in the testQueries02 of TestBoolean2.java, > the query is "+w3 xx", and we have two documents "w1 xx w2 yy w3" and "w1 w3 > xx w2 yy w3". > The second document should be more relevant than the first one, because it > has more > occurrences of the query term "w3". But the original test case would require > us to rank > the first document higher than the second one, which is not reasonable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]