I've been applying what I've learnt so far from this thread, but I'm
having a bit of trouble getting good performance when I put it all
together. The query I'm trying to execute in order to get the sum of a
count of keywords is below:

let $allKeywords := fn:collection()/doc/keywords/keyword/@value
let $distinct := fn:distinct-values($allKeywords)

for $k in $distinct
  return
    let $search := cts:element-attribute-word-query(fn:QName("",
"keyword"), fn:QName("", "value"), $k)
    let $results := cts:search(fn:collection()/doc/keywords/keyword,
$search)/@count
       return
fn:sum($results)

Running profiling on the query shows me that it's the XPath stuff I do
on the search results that's holding everything up, can anyone advise
how I can improve this?

Thankl

On Wed, Nov 12, 2008 at 7:24 PM, Michael Blakeley
<[EMAIL PROTECTED]> wrote:
> To be fair, absorbing the architecture and indexing behavior of a modern
> RDBMS isn't trivial either. XML content adds another dimension, but I hope
> you find the performance guide at http://developer.marklogic.com/pubs/4.0/
> helpful. There are also useful bits of server architecture discussion in the
> dev and admin guides.
>
> In the general case I wouldn't expect adding a range index to greatly
> improve value query performance. The list cache is pretty efficient at
> keeping frequently-used terms in memory.
>
> Usually the range indexes are created for applications that need particular
> features: fast sorting on a node value, fast range queries, fast access to
> distinct values, etc.
>
> -- Mike
>
> Whitby, Rob, CMG wrote:
>>
>> Wow, I didn't realise that. It will improve performance though right? On
>> a large database I assume the index of all XML elements and attributes
>> can't be held in memory.
>>
>> Understanding how the functions relate to the indexes is probably one of
>> areas I've found hardest with MarkLogic.
>>
>> Thanks
>> Rob
>>
>>
>>
>> -----Original Message-----
>> From: [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] On Behalf Of Michael
>> Blakeley
>> Sent: 12 November 2008 16:55
>> To: General Mark Logic Developer Discussion
>> Cc: [EMAIL PROTECTED]
>> Subject: Re: [MarkLogic Dev General] Improving XPathPerformance
>> onSearchResults
>>
>> Actually that query does *not* require any special indexes. The server
>> always indexes all XML element values and element-attribute values.
>>
>> You would only need an attribute range index for a fast "order by" on
>> keyword/@value, or for a cts:attribute-value-range-query term, or for
>> cts:element-attribute-values() and its associated functions.
>>
>> -- Mike
>>
>> Whitby, Rob, CMG wrote:
>>>
>>> If you put an attribute range index on keyword/@value you can do
>>> something like this:
>>>
>>> cts:search(
>>>  /doc/classifications/classification,
>>>  cts:element-attribute-value-query(xs:Qname("keyword",
>>> xs:Qname("value"), "something")
>>> )
>>>
>>> (untested!)
>>>
>>> Rob
>>>
>>> -----Original Message-----
>>> From: [EMAIL PROTECTED]
>>> [mailto:[EMAIL PROTECTED] On Behalf Of Steve
>>> Sent: 12 November 2008 14:41
>>> To: James Clippinger
>>> Cc: General Mark Logic Developer Discussion
>>> Subject: Re: [MarkLogic Dev General] Improving XPath Performance
>>> onSearchResults
>>>
>>> I should probably add that I'm trying to extract all classification
>>> values for the documents that have a specific keyword value.
>>>
>>> On Wed, Nov 12, 2008 at 2:40 PM, Steve <[EMAIL PROTECTED]>
>>> wrote:
>>>>
>>>> Thanks for your response.
>>>>
>>>> I've tried your suggestion and it doesn't really help. Looking at the
>>
>>>> profiling document, I can see that it's clearly the XPath on the
>>>> document results that is slowing the who thing down. Is there any other 
>>>> ways
>>>> that I can improve this. I've included a sample document (small), so you 
>>>> can
>>>> see what I'm trying to achieve.
>>>>
>>>> <doc>
>>>>  <classifications>
>>>>   <classification value="123" />
>>>>   <classification value="324" />
>>>>  </classifications>
>>>>  <keywords>
>>>>   <keyword value="word1" />
>>>>   <keyword value="word2" />
>>>>
>>>>  </keywords>
>>>> </doc>
>>>>
>>>>
>>>>
>>>> On Wed, Nov 12, 2008 at 2:24 PM, James Clippinger
>>>> <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>> Steve, your query is doing some heavyweight filtering for the XPath
>>>>> because it is doing two steps:
>>>>>
>>>>> 1) Execute the cts:search(): generate a list of all documents matching
>>>>> the query in relevance order.
>>>>>
>>>>> 2) Execute the XPath: reorder the documents into document order and
>>>>> find only those with /doc/classifications/classification elements, 
>>>>> returning
>>>>> those classification elements.
>>>>>
>>>>> Since you are using XPath and thus returning results in document order,
>>>>> you probably want to use cts:contains() in an XPath predicate
>>
>>>>> rather than cts:search().  cts:contains() in a rooted XPath expression
>>>>> will use the search indexes when appropriate, so it's as fast as the
>>>>> equivalent
>>>>> cts:search() expression.  Try this:
>>>>>
>>>>> let $search := cts:element-attribute-word-query(fn:QName("",
>>>>> "keyword"), fn:QName("", "value"), "something") return
>>>>> fn:collection()/doc[cts:contains(.,
>>>>> $search)/classifications/classification
>>>>>
>>>>> James
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: [EMAIL PROTECTED]
>>>>>> [mailto:[EMAIL PROTECTED] On Behalf Of Steve
>>>>>> Sent: Wednesday, November 12, 2008 8:54 AM
>>>>>> To: [email protected]
>>>>>> Subject: [MarkLogic Dev General] Improving XPath Performance on
>>>>>> SearchResults
>>>>>>
>>>>>> I've written a query which I use to search my data set and I am able
>>>>>> to get the results back very quickly. However the results that I get
>>>>>> back show the complete document that the search matched, where as I
>>
>>>>>> want a particular node returned.
>>>>>> At the moment I'm doing this:
>>>>>>
>>>>>> let $search := cts:element-attribute-word-query(fn:QName("",
>>>>>> "keyword"), fn:QName("", "value"), "something") let $results :=
>>>>>> cts:search(fn:collection(), $search)/doc/classifications/classification
>>>>>>   return $results
>>>>>>
>>>>>> I've tried profiling this query and I've found that there is a big lag
>>>>>> filtering the $results of the search using XPath.
>>>>>> Is there any way, either through using a different query or search
>>>>>> notation, or by indexes etc that I can speed this up.
>>>>>>
>>>>>> Thanks in advance...
>>>>>> _______________________________________________
>>>>>> General mailing list
>>>>>> [email protected]
>>>>>> http://xqzone.com/mailman/listinfo/general
>>>>>>
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://xqzone.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://xqzone.com/mailman/listinfo/general
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://xqzone.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://xqzone.com/mailman/listinfo/general
>
>
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to