I hope it's not too late to vote and my vote counts. +1 Some comments on the collection side. Why not use US/EU patent collection? I guess it is freely available, or am I wrong? Or at least it could be licensed with a less restrictive licence from some place??? It is not the biggest but may be a good one to have.
Some reasons to have such collection (if can be acquired) which might spark some lights in your head: 1) Technical-> Content statistics are completely different than any other collections, term distributions etc. May require specific parsers, tokenizer implementations. 2) Multi-language content (from national patents offices) 3) It's got socio-economic benefits both for the enterprise and inventors/creators/lawyers etc. If inventors can find more relevant documents, the better they can prepare their patent app etc. etc. Not to mention the patent offices, patent attorneys. Lucrative ;) 4) It's not hard to find expert judgements and maintain a user group which could really focus and give devotion to generate relevance judgements (compared to a nonsense, old news collection). Cheers, Murat Yakici Department of Computer & Information Sciences University of Strathclyde Glasgow, UK ------------------------------------------- The University of Strathclyde is a charitable body, registered in Scotland, with registration number SC015263. > I'd like to call a vote on adding the ORP as an official Lucene > subproject per the proposal at > http://wiki.apache.org/lucene-java/OpenRelevance > with the committers specified on the Wiki page. > > [] +1 - Yes, I love it > [] 0 - I don't care > [] -1 - I don't love it > > Thanks, > Grant >