No, I'm not trying to say weighting is driven by the index.  I was
trying to explain the differences (irrespective of weighting) in how TF
and IDF is calculated for cts:element-query and cts:element-word-query.

For cts:element-query() TF and IDF is based on the number of documents
(fragments) that contain this term anywhere in the document (fragment).

For cts:element-word-query() TF and IDF is based on the number of
documents
(fragments) that contain this combination of (element+word) *if* you
have "fast element word searches" ON (which is the default). If not, the
TF and IDF of the term reflects the number of occurrences across the
database (in any element) ... which is the same as cts:element-query.

I brought up 'phrase throughs' because in cts:element-word-query, any
descendants you want queried (below the specified QName) must be
specified as a phrase-through.  This is not required when using
cts:element-query ... as descendant nodes below the specified QName will
be interrogated.  However, you would still need to specify 'phrase
throughs' if you truly wanted a 'phrase' search to cross one of the
descendant QName boundaries.

I hope this is not too confusing ... and I hope that I haven't misspoken
... I'm sure someone from MarkLogic will set us straight if what I've
said above is wrong.

Darin.


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mike
Sokolov
Sent: Monday, May 07, 2007 8:46 AM
To: 'General Mark Logic Developer Discussion'
Subject: RE: [MarkLogic Dev General] Query weights

Thanks, Darin; 

It sounds like you are saying that weighting is driven by the existence
of
an index, and there is no index applying to "an element and all its
descendants".  Is that really true? Anyone from Mark Logic want to weigh
in?

-Mike

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Darin
McBeath
Sent: Tuesday, May 08, 2007 8:02 AM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Query weights


You might want to try cts:element-word-query ...
depending on what indexes you have enabled, TF and IDF
will be based only on the QName (and specified
phrase-through descendants).

For cts:element-query, TF and IDF are based on the
entire document (fragment).

At least that is my understanding.

Darin.

--- Peter Hickman <[EMAIL PROTECTED]> wrote:

> Michael Blakeley wrote:
> > I think the interesting point for your question is
> that scores are
> > calculated based on inverse document frequency
> (IDF) as well as term
> > frequency (TF). If that doesn't suit your
> application, you can choose
> > an alternative scoring technique: try score-logtf,
> or score-simple, as
> > options to cts:search() -
> >
>
http://developer.marklogic.com/pubs/3.1/apidocs/SearchBuiltins.html#sear
ch
> 
> > has more information.
> >
> 
> Sorry for the lateness of the reply (stuff cropping
> up at home, then a
> bank holiday, etc etc etc). I have tried
> "score-logtfidf", 
> "score-logtf", and even "score-simple". And although
> they change the 
> scores they do not seem to change the ordering. The
> problem as I see it 
> is that a weighting that applies to documents that
> matched on dc:title 
> seems to be applied to documents that do not match
> the dc:title. Given,
> 
>
cts:element-query(xs:QName("dc:title"),cts:word-query("bach",(),16)),
>
cts:element-query(xs:QName("opp:body"),cts:word-query("bach"))
> 
> in a cts:or-query should boost the score of
> documents that match "bach"
> in the dc:title element and not boost the score for
> documents that do 
> not. However the examples show that the score of
> documents that do not 
> have "bach" in the dc:title element are being
> boosted along with those 
> that do. This is confusing and makes me feel like I
> have no idea as to 
> what is going on. My understanding is that results
> 11 and 12 should not 
> have been boosted, but they were. I need to know why
> if I am to make use 
> of this facility.
> 
> --
> Peter Hickman.
> 
> Semantico, Lees House, 21-23 Dyke Road, Brighton BN1
> 3FE
> t: 01273 722222
> f: 01273 723232
> e: [EMAIL PROTECTED]
> w: www.semantico.com
> 
> _______________________________________________
> General mailing list
> [email protected] 
> http://xqzone.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to