Hi Grant,
Our query expansion approach is quite simple, we apply pseudo-
relevance feedback techniques, where a number of top retrieved
documents are used to extract the terms candidates to expand the
original query. We have used TermPositions in query time to extract
the term statistics necessaries for query expansion. On the other
hand, to implement BM25, we have used the implementation propoused by
Joaquin perez, where avg. Length is computed in indexing time and it
is used as a constant in query time.
We know that this is not the best way to do that, but we don't have
the knowledge about lucene to made a better implementation. Our code
is for research projects only, and it is not robust enougth for real-
world projects.
I have not followed any of the flexible indexing discussions, but this
subject seems to be very interesting to me. Do you have any pointers?.
Finally, the code has been reliased under apache license. But we are
not experts on this kind of things. Our idea is that everyone can take
the code, change it and use it without constraints.
Best regards
Jose
José Ramón Pérez Agüera
On 22/10/2008, at 20:16, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
Hi José,
Can you explain your approach to implementing? I'm curious how you
incorporated in the avg. doc length. Also, have you followed any of
the flexible indexing discussions?
Finally, what's the license on this code?
Thanks,
Grant
On Oct 21, 2008, at 10:14 AM, José Ramón Pérez Agüera wrote:
Hello,
We have implemented a research module for lucene using BM25 and our
structured version of BM25 as ranking functions and a couple of
state-of-art query expansion algoritms.
This implementation is quite different to other query expansion
modules for Lucene that are available in the web.
We have used this implementation in CLEF 2008 Robust WSD track
http://ixa2.si.ehu.es/clirwsd/ with very good results. The code and
other technical and scientific details can be accessed here:
http://grasia.fdi.ucm.es/jose/query-expansion
This is an alpha0.0.never.used.before.by.anyone version, we will work
on it in the following months to make it easier to use.
Please do not hesitate to send comments, questions and/or complaints
to the authors.
jose
--
José Ramón Pérez Agüera
Dept. de Ingeniería del Software e Inteligencia Artificial
Despacho 411 tlf. 913947599
Facultad de Informática
Universidad Complutense de Madrid
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]