Which query tuning/analysis tool should be used to determine if 
descendant::some-node is hitting an index?

I eventually realized that I didn't have all of the descendant elements I was 
searching for set as fragment roots, which once configured sped up the query 
almost 100x in some cases. I was using xdmp:query-trace(), which indicated that 
my paths were all fully searchable; however, I don't know how I would have 
determined which additional elements needed to be set as fragment roots.

Thanks,

Will


From: [email protected] 
[mailto:[email protected]] On Behalf Of Will Thompson
Sent: Thursday, July 28, 2011 11:57 AM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] xpath to cts query question

Thanks Darin. I did some more testing, and it looks like an optimized Xpath 
approach (more along the lines of Mike's second suggestion) might actually be 
the fastest. (I'm not sure how to use a cts:uris query like you suggested, so I 
didn't get a chance to test it) - this is what I landed on:

xdmp:directory($coll, 
"infinity")/(descendant-or-self::chapter|descendant-or-self::subchapter|descendant-or-self::section)
 [@enum eq $enum]

The fastest cts:query approach was a naïve union of cts:searches, but it was 
still much slower than the Xpath:

(cts:search(xdmp:directory($coll, "infinity")//chapter,
            cts:element-attribute-value-query(
                xs:QName("chapter"),
                xs:QName("enum"),$enum))
|cts:search(xdmp:directory($coll, "infinity")//subchapter,
            cts:element-attribute-value-query(
                xs:QName("subchapter"),
                xs:QName("enum"),$enum))
|cts:search(xdmp:directory($coll, "infinity")//section,
            cts:element-attribute-value-query(
                xs:QName("section"),
                xs:QName("enum"),$enum)))

Based on CQ's profiler, the cts:searches had far fewer expressions to evaluate 
(17), but still took 0.147s to execute, while the xpath evaluated 1370 
expressions and executed in only 0.01s. I don't know the explanation for this 
other than the ML XPath evaluator must be pretty good - a query-trace() 
confirmed that is was also hitting the range index. And it appears that there 
is some overhead to a cts:search that Xpath doesn't have.

Thanks for all of your suggestions.

-Will


From: [email protected] 
[mailto:[email protected]] On Behalf Of Darin McBeath
Sent: Thursday, July 28, 2011 8:21 AM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] xpath to cts query question

One other idea you might want to try (not sure if it will help or not but would 
be easy enough to experiment with).

Assuming you have a Lexicon URI, you could use a cts:uris query to identify 
those documents containing the chapter, subchapter, or section with the 
attribute value.  You could then iterate over these URIs with a doc($uri) and a 
simple XPath expression to retrieve only the chapter, subchapter, section with 
the attribute value.  If you don't know the QNames in advance, you could use 
xdmp:unpath.  Logically, I believe this is what Mike is suggesting below ... 
just doing it a bit different.  Like I said, don't know if it would be 
slower/faster.

Darin.

________________________________
From: Michael Blakeley <[email protected]>
To: General MarkLogic Developer Discussion <[email protected]>
Sent: Wednesday, July 27, 2011 7:59 PM
Subject: Re: [MarkLogic Dev General] xpath to cts query question

You might get a little faster by using doc() for the cts:search arg1, and 
relying on XPath to walk the trees. And since you are doing that, don't bother 
with filtering in cts:search.

let $qnames :=
  for $i in ("chapter", "subchapter", "section")
  return xs:QName($i)
return
  cts:search(
    doc(),
    cts:and-query(
      (cts:directory-query($coll, "infinity"),
      cts:element-attribute-value-query(
        $qnames, xs:QName("enum"), $enum))),
    "unfiltered")//(chapter | subchapter | section)[@enum eq $enum]

I don't know if that will be faster or slower, but it's worth a try.

Another variation is to go back to the XPath, and enumerate all the possible 
path expressions.

  /(a[@v eq $v] | a/b[@v eq $v] | a/b/c[@v eq $v])

-- Mike

On 27 Jul 2011, at 16:50 , Will Thompson wrote:

> This is the best I could come up with:
>
> let $qnames := for $i in ("chapter","subchapter","section") return 
> xs:QName($i)
> return cts:search(//(chapter|subchapter|section),
>            cts:and-query((
>                cts:directory-query($coll,"infinity"),
>                cts:element-attribute-value-query($qnames, 
> xs:QName("enum"),$enum))))[@enum eq $enum]
>
> This is still about twice as fast as the xpath, even if I can't easily work 
> around the predicate at the end.
>
> What's interesting is that the value query is slightly faster than the 
> element range query. I assume they're both using the range index, and the 
> value query is just doing it in less steps.
>
> Thanks for your help.
>
> -Will
>
> -----Original Message-----
> From: 
> [email protected]<mailto:[email protected]>
>  
> [mailto:[email protected]<mailto:[email protected]>]
>  On Behalf Of Will Thompson
> Sent: Wednesday, July 27, 2011 6:07 PM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] xpath to cts query question
>
> The data unfortunately won't allow for a more specific path. I was trying to 
> do something along the lines of what Mike suggested to utilize an attribute 
> range index, but the problem is that the cts:search will return multiple 
> documents because it includes ancestors, while the Xpath does not.
>
> Here's a less vague example:
>
> //(chapter|subchapter|section)[@enum="123"] will only return, say, the 
> section that matches,
>
> but this:
>
> cts:search(
>  doc(),
>  cts:element-attribute-range-query(
>    (xs:QName("chapter"), xs:QName("subchapter"), xs:QName("section")),
>    "=",
>    xs:QName("enum"), $enum)
>  )
> )
>
> will return the section and its ancestor chapter and subchapter, since they 
> are included in the searchable expression. The only way I could think to work 
> around this is separate queries, each with a searchable expression that 
> corresponds to the range query.
>
> -Will
>
>
> -----Original Message-----
> From: 
> [email protected]<mailto:[email protected]>
>  
> [mailto:[email protected]<mailto:[email protected]>]
>  On Behalf Of Danny Sokolsky
> Sent: Wednesday, July 27, 2011 5:51 PM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] xpath to cts query question
>
> Another thing that might help is if you know the full path to the nodes.  
> With //, it will have to look for the nodes anywhere in the documents.
>
> -Danny
>
> -----Original Message-----
> From: 
> [email protected]<mailto:[email protected]>
>  
> [mailto:[email protected]<mailto:[email protected]>]
>  On Behalf Of Michael Blakeley
> Sent: Wednesday, July 27, 2011 3:49 PM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] xpath to cts query question
>
> xdmp:plan help with that:
>
> ...
>  <qry:info-trace>Analyzing path: 
> fn:collection()/descendant-or-self::node()/(a|b|c|d|e|f|g)[@foo = 
> "bar"]</qry:info-trace>
>  <qry:info-trace>Step 1 is searchable: fn:collection()</qry:info-trace>
>  <qry:info-trace>Step 2 does not use indexes: 
> descendant-or-self::node()</qry:info-trace>
>  <qry:info-trace>Step 3 is searchable: (a|b|c|d|e|f|g)[@foo = 
> "bar"]</qry:info-trace>
>  <qry:info-trace>Path is fully searchable.</qry:info-trace>
>  <qry:info-trace>Gathering constraints.</qry:info-trace>
>  <qry:info-trace>Step 3 predicate 1 contributed 1 constraint: @foo = 
> "bar"</qry:info-trace>
> ...
>
> The cts:query would be something like:
>
> xdmp:plan(
>  cts:search(doc(),
>    cts:element-attribute-value-query(
>      for $i in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
>      return xs:QName($i),
>      xs:QName('foo'), 'bar')))
>
> If that isn't fast enough, the next step might be an element-attribute range 
> index on every element-attribute combination, and switching to 
> cts:element-range-query with operator '='.
>
> -- Mike
>
> On 27 Jul 2011, at 15:36 , Will Thompson wrote:
>
>> Thanks Danny. What I'm mainly trying to do is speed up some slow xpath. I've 
>> optimized a lot of this module, but this xpath seems to be one of the 
>> remaining bottlenecks: //(a|b|c|d|e|f|g)[@foo = "bar"]. I thought that by 
>> converting it to a cts:query it would be faster. Or is this Xpath already 
>> going to be optimized by MLS?
>>
>> -Will
>>
>> -----Original Message-----
>> From: 
>> [email protected]<mailto:[email protected]>
>>  
>> [mailto:[email protected]<mailto:[email protected]>]
>>  On Behalf Of Danny Sokolsky
>> Sent: Wednesday, July 27, 2011 5:20 PM
>> To: General MarkLogic Developer Discussion
>> Subject: Re: [MarkLogic Dev General] xpath to cts query question
>>
>> Hi Will,
>>
>> I might not be understanding what you are doing here, but here are a few 
>> ideas.
>>
>> I think you can use that XPath in the first arg of cts:search, as long as 
>> you do not put any variables in it.  Something like this:
>>
>> cts:search(//(a|b|c|d|e|f|g)[@foo = "bar"], "hello")
>>
>> Also, in cts:query, you can do a cts:element-query with the 
>> cts:element-attribute-query as its second arg.  Something like:
>>
>> cts:element-query((xs:QName("a"), xs:QName("b")),
>> cts:element-attribute-word-query((xs:QName("a"),
>>    xs:QName("b")), xs:QName("foo"), "bar"))
>>
>> -Danny
>>
>> -----Original Message-----
>> From: 
>> [email protected]<mailto:[email protected]>
>>  
>> [mailto:[email protected]<mailto:[email protected]>]
>>  On Behalf Of Will Thompson
>> Sent: Wednesday, July 27, 2011 2:51 PM
>> To: General MarkLogic Developer Discussion
>> Subject: [MarkLogic Dev General] xpath to cts query question
>>
>> I'm trying to create the cts equivalent of essentially this:
>>
>> //(a|b|c|d|e|f|g)[@attr = $val]
>>
>> But it seems like I would have join multiple cts:search()s, one for each 
>> element, since I only want the matching element, and not its parent (so I 
>> can't do something like cts:search(//(a|b|c|d|e|f|g), 
>> cts:element-attribute-value-query((xs:QName("a"),...,(xs:QName("g")),xs:QName("attr"),$val)).
>>
>> cts:search(//a, 
>> cts:element-attribute-value-query(xs:QName("a"),xs:QName("attr"),$val))
>> | cts:search(//b, 
>> cts:element-attribute-value-query(xs:QName("b"),xs:QName("attr"),$val))
>> | cts:search(//c, 
>> cts:element-attribute-value-query(xs:QName("c"),xs:QName("attr"),$val))
>> ...
>> | cts:search(//g, 
>> cts:element-attribute-value-query(xs:QName("g"),xs:QName("attr"),$val))
>>
>> Is there a better way to do this?
>>
>> Thank you!
>>
>> -Will
>> _______________________________________________
>> General mailing list
>> [email protected]<mailto:[email protected]>
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]<mailto:[email protected]>
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]<mailto:[email protected]>
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]>
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]>
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]>
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]>
> http://developer.marklogic.com/mailman/listinfo/general
>

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to