No, I'm not trying to say weighting is driven by the index. I was trying to explain the differences (irrespective of weighting) in how TF and IDF is calculated for cts:element-query and cts:element-word-query.
For cts:element-query() TF and IDF is based on the number of documents (fragments) that contain this term anywhere in the document (fragment). For cts:element-word-query() TF and IDF is based on the number of documents (fragments) that contain this combination of (element+word) *if* you have "fast element word searches" ON (which is the default). If not, the TF and IDF of the term reflects the number of occurrences across the database (in any element) ... which is the same as cts:element-query. I brought up 'phrase throughs' because in cts:element-word-query, any descendants you want queried (below the specified QName) must be specified as a phrase-through. This is not required when using cts:element-query ... as descendant nodes below the specified QName will be interrogated. However, you would still need to specify 'phrase throughs' if you truly wanted a 'phrase' search to cross one of the descendant QName boundaries. I hope this is not too confusing ... and I hope that I haven't misspoken ... I'm sure someone from MarkLogic will set us straight if what I've said above is wrong. Darin. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Sokolov Sent: Monday, May 07, 2007 8:46 AM To: 'General Mark Logic Developer Discussion' Subject: RE: [MarkLogic Dev General] Query weights Thanks, Darin; It sounds like you are saying that weighting is driven by the existence of an index, and there is no index applying to "an element and all its descendants". Is that really true? Anyone from Mark Logic want to weigh in? -Mike -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Darin McBeath Sent: Tuesday, May 08, 2007 8:02 AM To: General Mark Logic Developer Discussion Subject: Re: [MarkLogic Dev General] Query weights You might want to try cts:element-word-query ... depending on what indexes you have enabled, TF and IDF will be based only on the QName (and specified phrase-through descendants). For cts:element-query, TF and IDF are based on the entire document (fragment). At least that is my understanding. Darin. --- Peter Hickman <[EMAIL PROTECTED]> wrote: > Michael Blakeley wrote: > > I think the interesting point for your question is > that scores are > > calculated based on inverse document frequency > (IDF) as well as term > > frequency (TF). If that doesn't suit your > application, you can choose > > an alternative scoring technique: try score-logtf, > or score-simple, as > > options to cts:search() - > > > http://developer.marklogic.com/pubs/3.1/apidocs/SearchBuiltins.html#sear ch > > > has more information. > > > > Sorry for the lateness of the reply (stuff cropping > up at home, then a > bank holiday, etc etc etc). I have tried > "score-logtfidf", > "score-logtf", and even "score-simple". And although > they change the > scores they do not seem to change the ordering. The > problem as I see it > is that a weighting that applies to documents that > matched on dc:title > seems to be applied to documents that do not match > the dc:title. Given, > > cts:element-query(xs:QName("dc:title"),cts:word-query("bach",(),16)), > cts:element-query(xs:QName("opp:body"),cts:word-query("bach")) > > in a cts:or-query should boost the score of > documents that match "bach" > in the dc:title element and not boost the score for > documents that do > not. However the examples show that the score of > documents that do not > have "bach" in the dc:title element are being > boosted along with those > that do. This is confusing and makes me feel like I > have no idea as to > what is going on. My understanding is that results > 11 and 12 should not > have been boosted, but they were. I need to know why > if I am to make use > of this facility. > > -- > Peter Hickman. > > Semantico, Lees House, 21-23 Dyke Road, Brighton BN1 > 3FE > t: 01273 722222 > f: 01273 723232 > e: [EMAIL PROTECTED] > w: www.semantico.com > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
