Hi Ryan,

Even though it says the path is "fully searchable," that doesn't mean it's 
necessarily using the indexes to find all the appropriate results. An 
expression using not() will definitely depend on filtering, because you can't 
search for the absence of something using the indexes. Compare the output of 
xdmp:plan() or xdmp:query-trace() between these two expressions:

/stuff[child]
/stuff[not(child)]

In the first case, the index resolution should be very close to 100%, since the 
Server can look up all documents that have both <stuff> and <child>. In the 
second case, it will find all documents that have <stuff>, and the predicate 
[not(child)] will have to be resolved at the filtering stage (reading each 
<stuff> document to see if <child> is absent or not). If you see "Step 2 
predicate 1 contributed 1 constraint: child", that's an encouraging sign that 
the Server is making use of the index to evaluate the predicate.

If you are needing to squeeze more performance out, you can consider using 
cts:not-query() or cts:and-not-query(), but be very careful with these, because 
a false positive in the negated query will result in a false negative in the 
result (missing results).


Evan Lenz
Software Developer, Community
MarkLogic Corporation

email  [email protected]<mailto:[email protected]>
web    developer.marklogic.com<http://developer.marklogic.com/>


From: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Reply-To: General MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Tue, 31 May 2011 14:24:28 -0700
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [MarkLogic Dev General] using fn:not() in queries

I'm hoping someone can validate my understanding. From what I can tell through 
performance measuring, using fn:not() does not automatically my a query 
unsearchable (ie, not go against the index).

Given XML files in the DB that look like this :

<stuff>
     <child>Alan</child>
</stuff>

where there will always be a child element but there may or may not be a value.

If I write an XPath expression like this:

/stuff[fn:not(child/text())]

Then according to xdmp:plan(), the XPath is fully searchable, and according to 
the profiler, runs without a performance penalty because of the fn:not().

So my questions are: Are the absences of element values as fast as (or nearly 
so) as existing values in terms of using the indexes in queries? In other 
words, is querying for the absence of a value as fast as querying for a value?

Is there a faster way to query for an absent or empty value? I could change the 
data so that there would be no "child" element if there is no value for the 
element. Would that matter in terms of performance? Would it be faster to have 
"<child values-exists='no'></child>" and use that attribute in a positive query 
rather than just <child/> with a negative query?

>From my testing, it seems like using fn:not() in this case is just as good as 
>anything else. But I suspect there's more to the story.

Thanks,
Ryan
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to