Benchmarks Enhancements (precision/recall, TREC, Wikipedia)
-----------------------------------------------------------

                 Key: LUCENE-836
                 URL: https://issues.apache.org/jira/browse/LUCENE-836
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Other
            Reporter: Grant Ingersoll
            Priority: Minor


Would be great if the benchmark contrib had a way of providing precision/recall 
benchmark information ala TREC.  I don't know what the copyright issues are for 
the TREC queries/data (I think the queries are available, but not sure about 
the data), so not sure if the is even feasible, but I could imagine we could at 
least incorporate support for it for those who have access to the data.  It has 
been a long time since I have participated in TREC, so perhaps someone more 
familiar w/ the latest can fill in the blanks here.

Another option is to ask for volunteers to create queries and make judgments 
for the Reuters data, but that is a bit more complex and probably not 
necessary.  Even so, an Apache licensed set of benchmarks may be useful for the 
community as a whole.  Hmmm.... 

Wikipedia might be another option instead of Reuters to setup as a download for 
benchmarking, as it is quite large and I believe the licensing terms are quite 
amenable.  Having a larger collection would be good for stressing Lucene more 
and would give many users a demonstration of how Lucene handles large 
collections.

At any rate, this kind of information could be useful for people looking at 
different indexing schemes, formats, payloads and different query strategies.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to