Re: [MarkLogic Dev General] Improving XPathPerformance onSearchResults

Michael Blakeley Thu, 13 Nov 2008 08:50:28 -0800

Steve,

Is it correct to say that your report will always look at every documentmatching the XPath collection()/doc/keyword/@value ? If so, I believethe bottleneck is that you are looping through all those documents Ntimes, where N is close to count($distinct). You are also callingcts:search() N times, and each call has a cost.

Indexes are good for finding a needle in a haystack. But I think youwant to bale the hay instead? If so, this is a case where I'd chuck thefunctional model and use map:map as an accumulator(http://developer.marklogic.com/pubs/4.0/apidocs/map.html).

This query hasn't been tested with your content, but it parses cleanlyand evaluates on an empty database:


let $report := map:map()
let $build :=
  for $i in fn:collection()/doc/keywords/[EMAIL PROTECTED]@count]
  let $key as xs:string := $i/@value
  let $count as xs:integer := $i/@count + map:get($report, $key)
  return map:put($report, $key, $count)
for $key in map:keys($report)
return element keyword {
  attribute value { $key },
  attribute total { map:get($report, $key) }
}

That should work as long as all the input docs fit within the expandedtree cache. If not, you'll need to use a more recursive approach. I'dsuggest a module that handles N documents at a time, materializes themap as an XML document, and spawns itself again for the next N.

You might also consider using a range index and cts:frequency to avoidthe need for the @count attribute entirely. From what you've told us,though, I don't know if that would be an appropriate solution.


-- Mike

Steve wrote:

Sorry if I seem to keep moving the goalposts. I've been playing with
some of the parameters, and experimenting with my XQuery and the best
performance I can get is using the following query. Unfortunately, it
still takes a considerable amount of time to execute.  Profiling shows
that the XPath $nodes/@count is taking the time.

let $allKeywords := fn:collection()/doc/keywords/keyword/@value
let $distinct := fn:distinct-values($allKeywords)

for $k in $distinct
  return
    let $search := cts:element-attribute-word-query(fn:QName("",
"keyword"), fn:QName("", "value"), $k)
    let $results := cts:search(fn:collection()/doc/keywords/keyword, $search)
    let $nodes := [EMAIL PROTECTED] eq $k]
    let $counts := $nodes/@count
       return
          <keyword value="{$k}" total="{fn:sum($counts)}" />


On Thu, Nov 13, 2008 at 9:21 AM, Steve <[EMAIL PROTECTED]> wrote:

I've been applying what I've learnt so far from this thread, but I'm
having a bit of trouble getting good performance when I put it all
together. The query I'm trying to execute in order to get the sum of a
count of keywords is below:

let $allKeywords := fn:collection()/doc/keywords/keyword/@value
let $distinct := fn:distinct-values($allKeywords)

for $k in $distinct
 return
   let $search := cts:element-attribute-word-query(fn:QName("",
"keyword"), fn:QName("", "value"), $k)
   let $results := cts:search(fn:collection()/doc/keywords/keyword,
$search)/@count
      return
fn:sum($results)

Running profiling on the query shows me that it's the XPath stuff I do
on the search results that's holding everything up, can anyone advise
how I can improve this?

Thankl

On Wed, Nov 12, 2008 at 7:24 PM, Michael Blakeley
<[EMAIL PROTECTED]> wrote:

To be fair, absorbing the architecture and indexing behavior of a modern
RDBMS isn't trivial either. XML content adds another dimension, but I hope
you find the performance guide at http://developer.marklogic.com/pubs/4.0/
helpful. There are also useful bits of server architecture discussion in the
dev and admin guides.

In the general case I wouldn't expect adding a range index to greatly
improve value query performance. The list cache is pretty efficient at
keeping frequently-used terms in memory.

Usually the range indexes are created for applications that need particular
features: fast sorting on a node value, fast range queries, fast access to
distinct values, etc.

-- Mike

Whitby, Rob, CMG wrote:

Wow, I didn't realise that. It will improve performance though right? On
a large database I assume the index of all XML elements and attributes
can't be held in memory.

Understanding how the functions relate to the indexes is probably one of
areas I've found hardest with MarkLogic.

Thanks
Rob



-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael
Blakeley
Sent: 12 November 2008 16:55
To: General Mark Logic Developer Discussion
Cc: [EMAIL PROTECTED]
Subject: Re: [MarkLogic Dev General] Improving XPathPerformance
onSearchResults

Actually that query does *not* require any special indexes. The server
always indexes all XML element values and element-attribute values.

You would only need an attribute range index for a fast "order by" on
keyword/@value, or for a cts:attribute-value-range-query term, or for
cts:element-attribute-values() and its associated functions.

-- Mike

Whitby, Rob, CMG wrote:

If you put an attribute range index on keyword/@value you can do
something like this:

cts:search(
 /doc/classifications/classification,
 cts:element-attribute-value-query(xs:Qname("keyword",
xs:Qname("value"), "something")
)

(untested!)

Rob

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steve
Sent: 12 November 2008 14:41
To: James Clippinger
Cc: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Improving XPath Performance
onSearchResults

I should probably add that I'm trying to extract all classification
values for the documents that have a specific keyword value.

On Wed, Nov 12, 2008 at 2:40 PM, Steve <[EMAIL PROTECTED]>
wrote:

Thanks for your response.

I've tried your suggestion and it doesn't really help. Looking at the
profiling document, I can see that it's clearly the XPath on the
document results that is slowing the who thing down. Is there any other ways
that I can improve this. I've included a sample document (small), so you can
see what I'm trying to achieve.

<doc>
 <classifications>
  <classification value="123" />
  <classification value="324" />
 </classifications>
 <keywords>
  <keyword value="word1" />
  <keyword value="word2" />

 </keywords>
</doc>



On Wed, Nov 12, 2008 at 2:24 PM, James Clippinger
<[EMAIL PROTECTED]> wrote:

Steve, your query is doing some heavyweight filtering for the XPath
because it is doing two steps:

1) Execute the cts:search(): generate a list of all documents matching
the query in relevance order.

2) Execute the XPath: reorder the documents into document order and
find only those with /doc/classifications/classification elements, returning
those classification elements.

Since you are using XPath and thus returning results in document order,
you probably want to use cts:contains() in an XPath predicate
rather than cts:search().  cts:contains() in a rooted XPath expression
will use the search indexes when appropriate, so it's as fast as the
equivalent
cts:search() expression.  Try this:

let $search := cts:element-attribute-word-query(fn:QName("",
"keyword"), fn:QName("", "value"), "something") return
fn:collection()/doc[cts:contains(.,
$search)/classifications/classification

James

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steve
Sent: Wednesday, November 12, 2008 8:54 AM
To: [email protected]
Subject: [MarkLogic Dev General] Improving XPath Performance on
SearchResults

I've written a query which I use to search my data set and I am able
to get the results back very quickly. However the results that I get
back show the complete document that the search matched, where as I
want a particular node returned.
At the moment I'm doing this:

let $search := cts:element-attribute-word-query(fn:QName("",
"keyword"), fn:QName("", "value"), "something") let $results :=
cts:search(fn:collection(), $search)/doc/classifications/classification
  return $results

I've tried profiling this query and I've found that there is a big lag
filtering the $results of the search using XPath.
Is there any way, either through using a different query or search
notation, or by indexes etc that I can speed this up.

Thanks in advance...
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Improving XPathPerformance onSearchResults

Reply via email to