+1. Let's not get ahead of ourselves w/ changing the world or anything like that. First and foremost, we need this for Lucene, if others benefit, so be it. You are right on in that we need a shared, free way of judging whether Lucene is improving on relevance (even if it is already very good out of the box). Otherwise, we can't even have the conversation. For instance, it would help in evaluating the Axiomatic patch in JIRA or the SweetSpot stuff or a whole host of things (for instance, our current len. norm tends to favor shorter docs, is this the right default?)

On May 18, 2009, at 11:00 PM, Mark Miller wrote:

Grant Ingersoll wrote:
Some interesting discussion at 
http://thenoisychannel.com/2009/05/18/copying-trec-is-the-wrong-track-for-the-enterprise/
That was an interesting read. I think a lot of the argument misses the point. It doesn't seem to me that the main benefit or intent comes from 'bake offs' with other search engines ("Selling search applications to enterprises isn't, in my experience, about winning relevance bake-offs.") - the main benefit is allowing us to measure changes and improvements to Lucene's relevancy calculations and to make judgments about how Lucene currently performs. I see it easily as important as the Lucene benchmark contrib. Its not going to be a secret sauce, just like the benchmarker has been no secret sauce - but its going to make it easier to reliably improve Lucene in the future.

- Mark

On May 18, 2009, at 1:57 PM, Grant Ingersoll wrote:


On May 18, 2009, at 11:41 AM, Ted Dunning wrote:

On the other hand, it is likely that we could find query and click logs for
the documentation.

Only if they are redacted/aggregated first. ASF Members have access, but we'd need to get permission to distribute (after redaction/aggregation) I suspect. Given the AOL marketing fiasco, we'd have to go over them in pretty good detail before releasing to make sure there is no personal information. AFAIK, I'm the only ASF Member who has so far volunteered on this thread and I highly doubt I have the time for what I imagine to be a pretty decent sized endeavor.

Stripping IP address is pretty straightforward, but the query terms might be a bit more involved.

Still, can't hurt to find out what's involved.

-Grant




--
- Mark

http://www.lucidimagination.com




--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to