Hello Karsten, that is fine for me. Implementation cannot 100 % be matched to some theory as the ISO OSI model has perfectly shown. :-) Thats ok for me and I want to thank you again for the clarification I gained from this conversation.
Cheers > > Hello Ralf, > > >> > According to your description, Lucene basically maps the boolean query > into the vector space and measures the cosine similarity towards other > documents in the vector space. If I understood you correctly you mean if > a document is found by Lucene based on a boolean query it is relevant > (boolean true). If it is not returned, if was boolean false. The score > sits on top of it and can be used for ranking. If I would like to use > true boolean model I would therefore just need to ignore the score of > the Hits document. Did I understand correctly? > >> > > Yes, I think that this is indeed pretty close to some theoretical > foundation: The Boolean Model > explains which documents fit to a query, while some appropriate (Lucene > is good!) similarity > function in vector space yields the ranking. > > Now hell would be the place for me where I would have to prove that > Lucene's ranking is > exactly equivalent to some transformation of vector space and then using > the *cosine* for the > ranking. Can't be really, as Lucene sometimes returns results > 1.0 and > only some ruthless > normalisation keeps it within 0.0 to 1.0. In other words, there still > are some rough corners > in Lucene where a good theorist could find some work. > > Could we leave this topic aside until some suicid.. err, I mean > enthusiastic fellow > tries to work out a really good theory? > > Regards, > > Karsten > > > > > > -----Urspr�ngliche Nachricht----- > Von: Ralf B [mailto:[EMAIL PROTECTED] > Gesendet: Montag, 1. Dezember 2003 14:28 > An: Lucene Users List > Betreff: Re: AW: Real Boolean Model in Lucene? > > > Hi Karsten, > > I want to thank you for your qualified answer as well as your answer > >from the 14th of November, where you agreed with me that Lucene is > basically a VSM implementation. Sometimes it is difficult to make the > link between the clear theory and its implementation. > > According to your description, Lucene basically maps the boolean query > into the vector space and measures the cosine similarity towards other > documents in the vector space. If I understood you correctly you mean if > a document is found by Lucene based on a boolean query it is relevant > (boolean true). If it is not returned, if was boolean false. The score > sits on top of it and can be used for ranking. If I would like to use > true boolean model I would therefore just need to ignore the score of > the Hits document. Did I understand correctly? > > I aggree that nobody really want to do that. My question intended to > find out more about the implemented theory within Lucene. > > Cheers, > Ralph > > > > > > Hi, > > > > >> > > My Question: Does Lucene use TF/IDF for getting this? (which would > > mean > > it does not use the boolean model for the boolean query...) > > >> > > > > Lucene indeed uses TF/IDF with length normalization for fields and > > documents. > > > > However, Lucene is "downward compatible" to the Boolean Model where > > documents are represented as 0/1-vectors in Vector Space. Ranking just > > > adds weights to the elements of the result set, so the underlying > > interpretation of a query result can be still that of a > > Propositional/Boolean model. If a document appears in the result, its > > tokens valuate the query (which actually is a propositional formula > > formed over words and phrases) to true. The representation of > > documents is more complex in Lucene than required for the Boolean > > Model, and as a result, Lucene can efficiently handle phrases and > > proximity searches, but these seem to be compatible extensions - if > > you can do it in the Boolean Model, you can do it in Lucene :) > > > > One place where Lucene is not 100% compatible with a basic Boolean > > Model > > is that > > full negation is a bit tricky - you can not simply ask for all > documents > > that > > do not contain a certain term unless you also have some term that > > appears in all > > documents. Not a great deal, really. > > > > If TF/IDF weighting is a problem to you, the Similarity interface > > implementation allows you > > to remove all references to length normalization and document > > frequencies. > > > > Regards, > > > > Mit freundlichen Gr��en aus Saarbr�cken > > > > -- > > > > Dr.-Ing. Karsten Konrad > > Head of Artificial Intelligence Lab > > > > XtraMind Technologies GmbH > > Stuhlsatzenhausweg 3 > > D-66123 Saarbr�cken > > Phone: +49 (681) 3025113 > > Fax: +49 (681) 3025109 > > [EMAIL PROTECTED] > > www.xtramind.com > > > > > > > > -----Urspr�ngliche Nachricht----- > > Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > Gesendet: Montag, 1. Dezember 2003 13:11 > > An: [EMAIL PROTECTED] > > Betreff: Real Boolean Model in Lucene? > > > > > > Hi, > > > > is it possible to use a real boolean model in lucene for searching. > > When > > one is using the Queryparser with a boolean query (i.e. "dog AND > horse") > > one does get a list of documents from the Hits object. However these > > documents have a ranking (score). > > > > My Question: Does Lucene use TF/IDF for getting this? (which would > > mean > > it does not use the boolean model for the boolean query...) > > > > How can one use a boolean model search, where the outcome are all > > score=1 ? Example? > > > > Cheers, > > Ralph > > > > -- > > Neu bei GMX: Preissenkung f�r MMS-Versand und FreeMMS! > > > > Ideal f�r alle, die gerne MMS verschicken: > > 25 FreeMMS/Monat mit GMX TopMail. > > http://www.gmx.net/de/cgi/produktemail > > > > +++ GMX - die erste Adresse f�r Mail, Message, More! +++ > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > -- > HoHoHo! Seid Ihr auch alle sch�n brav gewesen? > > GMX Weihnachts-Special: Die 1. Adresse f�r Weihnachts- > m�nner und -frauen! http://www.gmx.net/de/cgi/specialmail > > +++ GMX - die erste Adresse f�r Mail, Message, More! +++ > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- HoHoHo! Seid Ihr auch alle sch�n brav gewesen? GMX Weihnachts-Special: Die 1. Adresse f�r Weihnachts- m�nner und -frauen! http://www.gmx.net/de/cgi/specialmail +++ GMX - die erste Adresse f�r Mail, Message, More! +++ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
