Not sure if this was mentioned before, but .... hm, I was going to point out 
http://index.isc.org/ (see 
http://ioiblog.wordpress.com/2008/11/07/kicking-off-the-ioi-blog/ ), but the 
server doesn't seem to be listening.... aha, here: 
http://ioiblog.wordpress.com/2009/02/

Perhaps we can get data from Dennis and Jeremie?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Ted Dunning <ted.dunn...@gmail.com>
> To: general@lucene.apache.org
> Sent: Wednesday, May 13, 2009 2:48:43 PM
> Subject: Re: Open Relevance Project?
> 
> Crawling a reference dataset requires essentially one-time bandwidth.
> 
> Also, it is possible to download, say, wikipedia in a single go.  Likewise
> there are various web-crawls that are available for research purposes (I
> think).  See http://webascorpus.org/ for one example.  These would be single
> downloads.
> 
> I don't entirely see the point of redoing the spidering.
> 
> On Wed, May 13, 2009 at 10:56 AM, Grant Ingersoll wrote:
> 
> > Good point, although you never know.  We also will have some bandwidth reqs
> > for crawling.
> >
> >
> 
> 
> -- 
> Ted Dunning, CTO
> DeepDyve

Reply via email to