karl wettin wrote:
On Nov 15, 2007 10:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
it is always good to have query logs
I realize that it is not that politically correct, but the TPB
collection is released to the public domain and contains 3.2 million
user queries with session id, timestamp, category etc to go with the
150,000+500,000 documents.
http://thepiratebay.org/tor/3783572
That's a good find! They use Lucene too!
I don't see any legal issues to us writing code that parses these files.
To be safest, I don't think we should republish the files, or even any
of the queries, but I don't think we should need to. Folks can download
them to their own machines and use them for testing there.
It doesn't look as though there's click data, so we can't use this for
relevance experiments without manually creating judgments. But for
performance benchmarking it could be useful.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]