Thanks for the response.

This seems to be odder the more I look at it. Firstly here are the queries that are used:

(:
   The "indian subcontinent" query
:)

declare namespace dc = "http://purl.org/dc/elements/1.1/";
declare namespace opp = "http://opp.oup.com/opp";
declare namespace grove = "http://www.grovecms.com/local/articles.dtd";

default element namespace = "http://opp.oup.com/opp";

let $query :=
 cts:and-query
 (
  (
    cts:element-query(xs:QName("opp:body"),"indian"),
    cts:element-query(xs:QName("opp:body"),"subcontinent")
   )
 )
return for $doc at $index in
(
 (
  cts:search
  (
/doc[not(opp:meta/opp:headword-matches/opp:self/@status = 'secondary') ],
   $query
  )
 ))
return <result at="{$index}">{base-uri($doc)}</result>

This returns 379 results with the target document at #1. However the following query (just the query part) returns 27 results without the target document:

(:
   The "indian subcontinent bronze" query
:)

let $query :=
 cts:and-query
 (
  (
   cts:element-query(xs:QName("opp:body"),"indian"),
   cts:element-query(xs:QName("opp:body"),"subcontinent"),
   cts:element-query(xs:QName("opp:body"),"bronze")
  )
 )

At this point I started to look into "bronze" itself. In the target document the term occurs nearly 100 times, quite a few times in grove:P elements so I search for this:

(:
   Bronze in grove:P
:)

let $query := cts:element-query(xs:QName("grove:P"),"bronze")

I get 2,484 unique documents returned with the target at #304. The grove:P element is itself the descendant of the opp:body element so I search for "bronze" in opp:body.

(:
   Bronze in grove:P
:)

let $query := cts:element-query(xs:QName("opp:body"),"bronze")

Were I get 2,154 unique documents which does not include the target. The thing to note here is that all documents have a opp:body (all those grove:P matches were descendants of opp:body elements) and yet we get fewer matches! This alone makes no sense, and this includes the fact that the 2,154 documents returned by the opp:body query includes some documents that were not in the grove:P results list. The total number of unique documents is 2,836.

I have checked the target document and the grove:P element with "bronze" in it is definitely a descendant of opp:body and appears to be no different to the documents that MarkLogic did return.

Unfortunately it would seem that once again I come up against a problem on a Friday before I go off for a weeks holiday :) Have a happy July the 4th, I suspect that this will still be here when I get back.

--
Peter Hickman.

Semantico, Lees House, 21-23 Dyke Road, Brighton BN1 3FE
t: 01273 358223
f: 01273 723232
e: [EMAIL PROTECTED]
w: www.semantico.com

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to