On May 21, 2008, at 10:23 PM, Grant Ingersoll wrote:
On May 21, 2008, at 8:26 AM, Stephen Green wrote:
Grant Ingersoll wrote:
Cool, hadn't seen that.
Hi folks. Long time lurker (in RSS), first time mailer. I just
wanted to say that (obviously) I think this is a great idea and we
should try to push it a little further along. I posted a bit more
about it in my blog this morning:
http://blogs.sun.com/searchguy/entry/open_source_trec_trecmentum
The practical upshot: I'd be more than happy to participate in
this and to try to get data sources and queries from Sun or
elsewhere. I'd also be up for trying to find some place to host
the collections and maybe even try to figure out some way that we
could get computing resources to run the evaluations. No
guarantees on that (I'm sure a Sun Lawyer's ears are burning
somewhere right now, just for me having said that!), but I'm
willing to tilt at that windmill.
I don't think we want to be in the collection business. It is a lot
of work and a serious amount of legal issues. I am just proposing
we come up w/ questions and judgments for already existing, freely
available collections. There are plenty of them out there, we just
need some scripts, etc. to make it easy for people to download like
we do already with Wikipedia.
The problem I see in relying on relying on collections that are held
elsewhere is that they could go away at any time and there goes all
our investment in creating evaluations. I'm willing to take a crack
at the folks here to see if we could get permission (and lawyer
approval?) for hosting some collections.
Wikipedia's a pretty easy one to start with, then the OpenSolaris
mailing lists (probably just as easy: we already host them and I know
some of the folks involved), then maybe a blog crawl and a small Web
crawl (anyone got a Nutch going anywhere?)
I'm pretty sure that we could do an evaluation wiki on wikis.sun.com.
I like the idea you gave in your blog of having to submit source code
for the runs if you want to put up your results. This is indeed one
of the most aggravating things about implementing search algorithms
described in papers and it would definitely drive everyone forward.
TREC had a huge impact on the academic and commercial IR
communities and I think an OSTREC (see, it's already got a cool
acronym!) could benefit all of us (it would give us bragging rights
if nothing else :-)
Cool name, don't care much about bragging rights, just want to spur
on further improvements in scoring, etc.
OK, OSTREC it is. I'll start talking to my management (being in the
Labs makes this a little easier) and I'll try not to brag too much if
you (all) won't!
Steve
--
Stephen Green // [EMAIL PROTECTED]
Principal Investigator \\ http://blogs.sun.com/searchguy
Aura Project // Voice: +1 781-442-0926
Sun Microsystems Labs \\ Fax: +1 781-442-1692
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]