I like it too. And I'm wondering what the response to this will be -- it will
in a way show if TREC really stands up to their mission, won't it?
D.
Grant Ingersoll wrote:
How does this sound:
Dear ----,
My name is Grant Ingersoll and I am committer on the Lucene Java search
library (http://lucene.apache.org) at the Apache Software Foundation
(ASF). I am not, however, writing in any official capacity as a
representative of the ASF. Perhaps at a later date, this will change,
but for now I just want to keep things informal.
I am, however, interested in starting a discussion about how open source
projects like Lucene could participate in future TREC evaluations, or at
least gain access to TREC data resources. While the people involved in
Lucene feel we have built a top notch search system, one of the things
the community as a whole lacks is the ability to do formal evaluations
like TREC offers, and thus research and development of new algorithms is
hindered. Granted, individuals may perform TREC evaluations given they
have purchased a license to the data, but the community as a whole does
not have this ability.
I am wondering if there is some way in which we can arrange for open
source projects to obtain access to the TREC collections. The biggest
barrier for projects like Lucene, obviously, is the fee that needs to be
paid. Furthermore, there are undoubtedly distribution and copyright
concerns. Yet, a part of me feels that we can work something out
through creative licensing or some other novel approach that protects
the appropriate interests, furthers TREC's mission and supports the
vibrant Open Source community around Lucene and other search engines.
Perhaps it would be possible to require that any participant who wants
the TREC data must prove that they are appropriately affiliated with an
official open source project, as defined by the Open Source Initiative
(http://www.opensource.org). Many tool vendors have similar licenses
that allow open source participants to use their tool while working on
open source projects[1]. Perhaps we could provide a similar approach to
the TREC data.
I feel this would benefit TREC substantially, by providing an open,
baseline system for all the world to see and I see that it fits very
much with the motto of TREC "...to encourage research in information
retrieval from large text collections." Naturally, it benefits Lucene
by allowing Lucene to undertake more formal evaluation of relevance, etc.
If you are interested in more background on this on the Lucene Java
developers mailing list, please refer to
http://www.gossamer-threads.com/lists/lucene/java-dev/52022?search_string=TREC;#52022
I look forward to hearing back from you and I would be more than happy
to answer any questions you have.
Sincerely,
Grant Ingersoll
[1] JetBrains, Atlassian, Clover Test Coverage, etc.
-------
-Grant
On Aug 10, 2007, at 4:52 AM, Tom White wrote:
Furthermore, I think it would
encourage Lucene users/developers to think about relevance as much as
we think about speed.
+1
However I think it would be much better to start by making informal
approaches as you suggest - the open letter seems to me to be
appropriate only as a last resort.
Tom
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]