karl wettin wrote:
On Nov 15, 2007 10:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
it is always good to have query logs

I realize that it is not that politically correct, but the TPB
collection is released to the public domain and contains 3.2 million
user queries with session id, timestamp, category etc to go with the
150,000+500,000 documents.


http://thepiratebay.org/tor/3783572

That's a good find!  They use Lucene too!

I don't see any legal issues to us writing code that parses these files. To be safest, I don't think we should republish the files, or even any of the queries, but I don't think we should need to. Folks can download them to their own machines and use them for testing there.

It doesn't look as though there's click data, so we can't use this for relevance experiments without manually creating judgments. But for performance benchmarking it could be useful.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to