On May 21, 2008, at 10:23 PM, Grant Ingersoll wrote:


On May 21, 2008, at 8:26 AM, Stephen Green wrote:

Grant Ingersoll wrote:

Cool, hadn't seen that.

Hi folks. Long time lurker (in RSS), first time mailer. I just wanted to say that (obviously) I think this is a great idea and we should try to push it a little further along. I posted a bit more about it in my blog this morning:

http://blogs.sun.com/searchguy/entry/open_source_trec_trecmentum

The practical upshot: I'd be more than happy to participate in this and to try to get data sources and queries from Sun or elsewhere. I'd also be up for trying to find some place to host the collections and maybe even try to figure out some way that we could get computing resources to run the evaluations. No guarantees on that (I'm sure a Sun Lawyer's ears are burning somewhere right now, just for me having said that!), but I'm willing to tilt at that windmill.

I don't think we want to be in the collection business. It is a lot of work and a serious amount of legal issues. I am just proposing we come up w/ questions and judgments for already existing, freely available collections. There are plenty of them out there, we just need some scripts, etc. to make it easy for people to download like we do already with Wikipedia.

The problem I see in relying on relying on collections that are held elsewhere is that they could go away at any time and there goes all our investment in creating evaluations. I'm willing to take a crack at the folks here to see if we could get permission (and lawyer approval?) for hosting some collections.

Wikipedia's a pretty easy one to start with, then the OpenSolaris mailing lists (probably just as easy: we already host them and I know some of the folks involved), then maybe a blog crawl and a small Web crawl (anyone got a Nutch going anywhere?)

I'm pretty sure that we could do an evaluation wiki on wikis.sun.com. I like the idea you gave in your blog of having to submit source code for the runs if you want to put up your results. This is indeed one of the most aggravating things about implementing search algorithms described in papers and it would definitely drive everyone forward.

TREC had a huge impact on the academic and commercial IR communities and I think an OSTREC (see, it's already got a cool acronym!) could benefit all of us (it would give us bragging rights if nothing else :-)


Cool name, don't care much about bragging rights, just want to spur on further improvements in scoring, etc.

OK, OSTREC it is. I'll start talking to my management (being in the Labs makes this a little easier) and I'll try not to brag too much if you (all) won't!

Steve
--
Stephen Green                      //   [EMAIL PROTECTED]
Principal Investigator             \\   http://blogs.sun.com/searchguy
Aura Project                       //   Voice: +1 781-442-0926
Sun Microsystems Labs              \\   Fax:   +1 781-442-1692




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to