+1. Let's not get ahead of ourselves w/ changing the world or
anything like that. First and foremost, we need this for Lucene, if
others benefit, so be it. You are right on in that we need a shared,
free way of judging whether Lucene is improving on relevance (even if
it is already very good out of the box). Otherwise, we can't even
have the conversation. For instance, it would help in evaluating the
Axiomatic patch in JIRA or the SweetSpot stuff or a whole host of
things (for instance, our current len. norm tends to favor shorter
docs, is this the right default?)
On May 18, 2009, at 11:00 PM, Mark Miller wrote:
Grant Ingersoll wrote:
Some interesting discussion at
http://thenoisychannel.com/2009/05/18/copying-trec-is-the-wrong-track-for-the-enterprise/
That was an interesting read. I think a lot of the argument misses
the point. It doesn't seem to me that the main benefit or intent
comes from 'bake offs' with other search engines ("Selling search
applications to enterprises isn't, in my experience, about winning
relevance bake-offs.") - the main benefit is allowing us to measure
changes and improvements to Lucene's relevancy calculations and to
make judgments about how Lucene currently performs. I see it easily
as important as the Lucene benchmark contrib. Its not going to be a
secret sauce, just like the benchmarker has been no secret sauce -
but its going to make it easier to reliably improve Lucene in the
future.
- Mark
On May 18, 2009, at 1:57 PM, Grant Ingersoll wrote:
On May 18, 2009, at 11:41 AM, Ted Dunning wrote:
On the other hand, it is likely that we could find query and
click logs for
the documentation.
Only if they are redacted/aggregated first. ASF Members have
access, but we'd need to get permission to distribute (after
redaction/aggregation) I suspect. Given the AOL marketing
fiasco, we'd have to go over them in pretty good detail before
releasing to make sure there is no personal information. AFAIK,
I'm the only ASF Member who has so far volunteered on this thread
and I highly doubt I have the time for what I imagine to be a
pretty decent sized endeavor.
Stripping IP address is pretty straightforward, but the query
terms might be a bit more involved.
Still, can't hurt to find out what's involved.
-Grant
--
- Mark
http://www.lucidimagination.com
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search