Hi Grant,

Our query expansion approach is quite simple, we apply pseudo- relevance feedback techniques, where a number of top retrieved documents are used to extract the terms candidates to expand the original query. We have used TermPositions in query time to extract the term statistics necessaries for query expansion. On the other hand, to implement BM25, we have used the implementation propoused by Joaquin perez, where avg. Length is computed in indexing time and it is used as a constant in query time.

We know that this is not the best way to do that, but we don't have the knowledge about lucene to made a better implementation. Our code is for research projects only, and it is not robust enougth for real- world projects.

I have not followed any of the flexible indexing discussions, but this subject seems to be very interesting to me. Do you have any pointers?.

Finally, the code has been reliased under apache license. But we are not experts on this kind of things. Our idea is that everyone can take the code, change it and use it without constraints.

Best regards

Jose

José Ramón Pérez Agüera

On 22/10/2008, at 20:16, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

Hi José,

Can you explain your approach to implementing? I'm curious how you incorporated in the avg. doc length. Also, have you followed any of the flexible indexing discussions?

Finally, what's the license on this code?

Thanks,
Grant

On Oct 21, 2008, at 10:14 AM, José Ramón Pérez Agüera wrote:

Hello,

We have implemented a research module for lucene using BM25 and our
structured version of BM25 as ranking functions and a couple of
state-of-art query expansion algoritms.

This implementation is quite different to other query expansion
modules for Lucene that are available in the web.

We have used this implementation in CLEF 2008 Robust WSD track
http://ixa2.si.ehu.es/clirwsd/ with very good results. The code and
other technical and scientific details can be accessed here:

http://grasia.fdi.ucm.es/jose/query-expansion

This is an alpha0.0.never.used.before.by.anyone version, we will work
on it in the following months to make it easier to use.

Please do not hesitate to send comments, questions and/or complaints
to the authors.

jose

--
José Ramón Pérez Agüera

Dept. de Ingeniería del Software e Inteligencia Artificial
Despacho 411 tlf. 913947599
Facultad de Informática
Universidad Complutense de Madrid

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ










---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to