It is perfect :-) I think, maybe would be interesting that you send a CC to LCD, because I think that they have some kind of rights on TREC collections.
http://trec.nist.gov/data/docs_eng.html http://www.ldc.upenn.edu/ jose On 8/20/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > How does this sound: > > Dear ----, > > My name is Grant Ingersoll and I am committer on the Lucene Java > search library (http://lucene.apache.org) at the Apache Software > Foundation (ASF). I am not, however, writing in any official > capacity as a representative of the ASF. Perhaps at a later date, > this will change, but for now I just want to keep things informal. > > I am, however, interested in starting a discussion about how open > source projects like Lucene could participate in future TREC > evaluations, or at least gain access to TREC data resources. While > the people involved in Lucene feel we have built a top notch search > system, one of the things the community as a whole lacks is the > ability to do formal evaluations like TREC offers, and thus research > and development of new algorithms is hindered. Granted, individuals > may perform TREC evaluations given they have purchased a license to > the data, but the community as a whole does not have this ability. > > I am wondering if there is some way in which we can arrange for open > source projects to obtain access to the TREC collections. The > biggest barrier for projects like Lucene, obviously, is the fee that > needs to be paid. Furthermore, there are undoubtedly distribution > and copyright concerns. Yet, a part of me feels that we can work > something out through creative licensing or some other novel approach > that protects the appropriate interests, furthers TREC's mission and > supports the vibrant Open Source community around Lucene and other > search engines. Perhaps it would be possible to require that any > participant who wants the TREC data must prove that they are > appropriately affiliated with an official open source project, as > defined by the Open Source Initiative (http://www.opensource.org). > Many tool vendors have similar licenses that allow open source > participants to use their tool while working on open source projects > [1]. Perhaps we could provide a similar approach to the TREC data. > > I feel this would benefit TREC substantially, by providing an open, > baseline system for all the world to see and I see that it fits very > much with the motto of TREC "...to encourage research in information > retrieval from large text collections." Naturally, it benefits > Lucene by allowing Lucene to undertake more formal evaluation of > relevance, etc. > > If you are interested in more background on this on the Lucene Java > developers mailing list, please refer to > http://www.gossamer-threads.com/lists/lucene/java-dev/52022? > search_string=TREC;#52022 > > I look forward to hearing back from you and I would be more than > happy to answer any questions you have. > > Sincerely, > Grant Ingersoll > > [1] JetBrains, Atlassian, Clover Test Coverage, etc. > > ------- > > -Grant > > > > > > On Aug 10, 2007, at 4:52 AM, Tom White wrote: > > >> Furthermore, I think it would > >> encourage Lucene users/developers to think about relevance as much as > >> we think about speed. > > > > +1 > > > > However I think it would be much better to start by making informal > > approaches as you suggest - the open letter seems to me to be > > appropriate only as a last resort. > > > > Tom > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- José Ramón Pérez Agüera Dept. de Ingeniería del Software e Inteligencia Artificial Despacho 411 tlf. 913947599 Facultad de Informática Universidad Complutense de Madrid