Hi, >> My Question: Does Lucene use TF/IDF for getting this? (which would mean it does not use the boolean model for the boolean query...) >>
Lucene indeed uses TF/IDF with length normalization for fields and documents. However, Lucene is "downward compatible" to the Boolean Model where documents are represented as 0/1-vectors in Vector Space. Ranking just adds weights to the elements of the result set, so the underlying interpretation of a query result can be still that of a Propositional/Boolean model. If a document appears in the result, its tokens valuate the query (which actually is a propositional formula formed over words and phrases) to true. The representation of documents is more complex in Lucene than required for the Boolean Model, and as a result, Lucene can efficiently handle phrases and proximity searches, but these seem to be compatible extensions - if you can do it in the Boolean Model, you can do it in Lucene :) One place where Lucene is not 100% compatible with a basic Boolean Model is that full negation is a bit tricky - you can not simply ask for all documents that do not contain a certain term unless you also have some term that appears in all documents. Not a great deal, really. If TF/IDF weighting is a problem to you, the Similarity interface implementation allows you to remove all references to length normalization and document frequencies. Regards, Mit freundlichen Gr��en aus Saarbr�cken -- Dr.-Ing. Karsten Konrad Head of Artificial Intelligence Lab XtraMind Technologies GmbH Stuhlsatzenhausweg 3 D-66123 Saarbr�cken Phone: +49 (681) 3025113 Fax: +49 (681) 3025109 [EMAIL PROTECTED] www.xtramind.com -----Urspr�ngliche Nachricht----- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Gesendet: Montag, 1. Dezember 2003 13:11 An: [EMAIL PROTECTED] Betreff: Real Boolean Model in Lucene? Hi, is it possible to use a real boolean model in lucene for searching. When one is using the Queryparser with a boolean query (i.e. "dog AND horse") one does get a list of documents from the Hits object. However these documents have a ranking (score). My Question: Does Lucene use TF/IDF for getting this? (which would mean it does not use the boolean model for the boolean query...) How can one use a boolean model search, where the outcome are all score=1 ? Example? Cheers, Ralph -- Neu bei GMX: Preissenkung f�r MMS-Versand und FreeMMS! Ideal f�r alle, die gerne MMS verschicken: 25 FreeMMS/Monat mit GMX TopMail. http://www.gmx.net/de/cgi/produktemail +++ GMX - die erste Adresse f�r Mail, Message, More! +++ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
