Hi all,

finally I got some time to finish the BM25/BM25F implementation for Lucene you can find more details at http://nlp.uned.es/~jperezi/Lucene-BM25/, it has been tested but I cannot assure that is bugs free.
It would be great to receive some feedback about it.

There are some details about the implementation that I consider will be of interest,as how to calculate the average_length or idf at document level. Please if you find any bug or mistake in the supplied implementation let me know and I will try to solve it, same for questions.

Hope that some of you will find useful.

Thanks in advance.



[EMAIL PROTECTED] escribió:
Hi Otis,

as my colleague said, we have a first implementation of BM25 over Lucene, this 
development is part of a (almost finished) thesis project that compares 
different IR models, over an standard collection. At the same time we are 
trying to extend this first implementation in order to support BM25F for 
multifield queries, unfortunately at this time we are too busy to prepare a 
final version of this code, so we will have to finish this code over the summer 
(hopefully we will have more time :-))), and make it public at this time.

We will inform to this list when we will finish the preparation of a final 
version.

Thanks to everybody for the interest!!!

Bye
Joaquin

-----------------------------------------------------------
Joaquín Pérez Iglesias
Dpto. Lenguajes y Sistemas Informáticos
E.T.S.I. Informática (UNED)
Ciudad Universitaria
C/ Juan del Rosal nº 16
28040 Madrid - Spain
Phone. +34 91 398 87 25
Fax    +34 91 398 65 35
Office  2.07
Email: [EMAIL PROTECTED]
----------------------------------------------------------- Otis Gospodnetic <[EMAIL PROTECTED]> escribe :

Hi Jose,

I was wondering if you ever got to this.  I would love to see and try BM25 for
Lucene!


I'm looking at http://code.google.com/soc/2008/asf/about.html
and it looks like this didn't make it into GSoC, but this would still be great
to have.

Thanks,
Otis
--
Sematext -- http://sematext.com/ --
Lucene - Solr - Nutch


----- Original Message ----
From: José Ramón Pérez Agüera <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org;
Joaquin Perez-Iglesias <[EMAIL PROTECTED]>
Sent: Saturday, March 15, 2008 4:54:08 AM
Subject: Re: Summer of Code idea for lucene

we have almost implemented BM25 using lucene structure, but we need
help to finish query parser and other details. If you o somebody want
We can send you the code and you can help us to implement the query
parser and prepare the code to sandbox.

If there are people interested I can made a web page for the project
and put our implementatio to download

Somebody is interested?

jose

--
José Ramón Pérez Agüera

Dept. de Ingeniería del Software e Inteligencia Artificial
Despacho 411 tlf. 913947599
Facultad de Informática
Universidad Complutense de Madrid

On Sat, Mar 15, 2008 at 5:32 AM, Ian Holsman wrote:
If no one objects (I don't think it's too late)

 would you mind a GSOC project to implement BM25
relevancy/scoring?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

________________________________________________
Servicio WebMail de CiberUNED http://www.uned.es



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
-----------------------------------------------------------
Joaquín Pérez Iglesias
Dpto. Lenguajes y Sistemas Informáticos
E.T.S.I. Informática (UNED)
Ciudad Universitaria
C/ Juan del Rosal nº 16
28040 Madrid - Spain
Phone. +34 91 398 87 25
Fax    +34 91 398 65 35
Office  2.07
Email: [EMAIL PROTECTED]
web:   http://nlp.uned.es/~jperezi/
-----------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to