The query I wrote loops through all the values in the index, and
cts:frequency returns how many fragments the value occurs in. I'm not
clear exactly what you're trying to do, but the start of your query can
use the index:

> let $allKeywords := fn:collection()/doc/keywords/keyword/@value
> let $distinct := fn:distinct-values($allKeywords)
>
> for $k in $distinct

can be replaced with: 

for $k in cts:element-attribute-values(xs:QName("keyword",
xs:QName("value")) 


Hope this helps!



-----Original Message-----
From: Steve [mailto:[EMAIL PROTECTED] 
Sent: 13 November 2008 10:34
To: Whitby, Rob, CMG
Cc: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Improving XPathPerformance
onSearchResults

I've looked at the query and tried to execute it, but I get an error
telling me that the argument passed to cts:frequency(..) is not of type
item(). And looking at the query, won't that return the count of
<keyword> elements rather than the sum of the count attributes of each
distinct keyword element?

On Thu, Nov 13, 2008 at 10:22 AM, Whitby, Rob, CMG
<[EMAIL PROTECTED]> wrote:
> This is where an index really is useful..
>
> for $keyword in cts:element-attribute-values(xs:Qname("keyword",
> xs:Qname("value"))
> let $count := cts:frequency($keyword)
> order by $count descending
> return <keyword value="{$keyword}" count="{$count}"/>
>
>
>
>
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Steve
> Sent: 13 November 2008 10:13
> To: Michael Blakeley
> Cc: General Mark Logic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Improving XPathPerformance 
> onSearchResults
>
> Sorry if I seem to keep moving the goalposts. I've been playing with 
> some of the parameters, and experimenting with my XQuery and the best 
> performance I can get is using the following query. Unfortunately, it 
> still takes a considerable amount of time to execute.  Profiling shows

> that the XPath $nodes/@count is taking the time.
>
> let $allKeywords := fn:collection()/doc/keywords/keyword/@value
> let $distinct := fn:distinct-values($allKeywords)
>
> for $k in $distinct
>  return
>    let $search := cts:element-attribute-word-query(fn:QName("",
> "keyword"), fn:QName("", "value"), $k)
>    let $results := cts:search(fn:collection()/doc/keywords/keyword,
> $search)
>    let $nodes := [EMAIL PROTECTED] eq $k]
>    let $counts := $nodes/@count
>       return
>          <keyword value="{$k}" total="{fn:sum($counts)}" />
>
>
> On Thu, Nov 13, 2008 at 9:21 AM, Steve <[EMAIL PROTECTED]>
> wrote:
>> I've been applying what I've learnt so far from this thread, but I'm 
>> having a bit of trouble getting good performance when I put it all 
>> together. The query I'm trying to execute in order to get the sum of 
>> a
>
>> count of keywords is below:
>>
>> let $allKeywords := fn:collection()/doc/keywords/keyword/@value
>> let $distinct := fn:distinct-values($allKeywords)
>>
>> for $k in $distinct
>>  return
>>    let $search := cts:element-attribute-word-query(fn:QName("",
>> "keyword"), fn:QName("", "value"), $k)
>>    let $results := cts:search(fn:collection()/doc/keywords/keyword,
>> $search)/@count
>>       return
>> fn:sum($results)
>>
>> Running profiling on the query shows me that it's the XPath stuff I 
>> do
>
>> on the search results that's holding everything up, can anyone advise

>> how I can improve this?
>>
>> Thankl
>>
>> On Wed, Nov 12, 2008 at 7:24 PM, Michael Blakeley 
>> <[EMAIL PROTECTED]> wrote:
>>> To be fair, absorbing the architecture and indexing behavior of a 
>>> modern RDBMS isn't trivial either. XML content adds another 
>>> dimension, but I hope you find the performance guide at 
>>> http://developer.marklogic.com/pubs/4.0/
>>> helpful. There are also useful bits of server architecture 
>>> discussion
>
>>> in the dev and admin guides.
>>>
>>> In the general case I wouldn't expect adding a range index to 
>>> greatly
>
>>> improve value query performance. The list cache is pretty efficient 
>>> at keeping frequently-used terms in memory.
>>>
>>> Usually the range indexes are created for applications that need 
>>> particular
>>> features: fast sorting on a node value, fast range queries, fast 
>>> access to distinct values, etc.
>>>
>>> -- Mike
>>>
>>> Whitby, Rob, CMG wrote:
>>>>
>>>> Wow, I didn't realise that. It will improve performance though 
>>>> right? On a large database I assume the index of all XML elements 
>>>> and attributes can't be held in memory.
>>>>
>>>> Understanding how the functions relate to the indexes is probably 
>>>> one of areas I've found hardest with MarkLogic.
>>>>
>>>> Thanks
>>>> Rob
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: [EMAIL PROTECTED]
>>>> [mailto:[EMAIL PROTECTED] On Behalf Of 
>>>> Michael Blakeley
>>>> Sent: 12 November 2008 16:55
>>>> To: General Mark Logic Developer Discussion
>>>> Cc: [EMAIL PROTECTED]
>>>> Subject: Re: [MarkLogic Dev General] Improving XPathPerformance 
>>>> onSearchResults
>>>>
>>>> Actually that query does *not* require any special indexes. The 
>>>> server always indexes all XML element values and element-attribute
> values.
>>>>
>>>> You would only need an attribute range index for a fast "order by"
>>>> on keyword/@value, or for a cts:attribute-value-range-query term, 
>>>> or
>
>>>> for
>>>> cts:element-attribute-values() and its associated functions.
>>>>
>>>> -- Mike
>>>>
>>>> Whitby, Rob, CMG wrote:
>>>>>
>>>>> If you put an attribute range index on keyword/@value you can do 
>>>>> something like this:
>>>>>
>>>>> cts:search(
>>>>>  /doc/classifications/classification,
>>>>>  cts:element-attribute-value-query(xs:Qname("keyword",
>>>>> xs:Qname("value"), "something")
>>>>> )
>>>>>
>>>>> (untested!)
>>>>>
>>>>> Rob
>>>>>
>>>>> -----Original Message-----
>>>>> From: [EMAIL PROTECTED]
>>>>> [mailto:[EMAIL PROTECTED] On Behalf Of 
>>>>> Steve
>>>>> Sent: 12 November 2008 14:41
>>>>> To: James Clippinger
>>>>> Cc: General Mark Logic Developer Discussion
>>>>> Subject: Re: [MarkLogic Dev General] Improving XPath Performance 
>>>>> onSearchResults
>>>>>
>>>>> I should probably add that I'm trying to extract all 
>>>>> classification
>
>>>>> values for the documents that have a specific keyword value.
>>>>>
>>>>> On Wed, Nov 12, 2008 at 2:40 PM, Steve <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>>
>>>>>> Thanks for your response.
>>>>>>
>>>>>> I've tried your suggestion and it doesn't really help. Looking at

>>>>>> the
>>>>
>>>>>> profiling document, I can see that it's clearly the XPath on the 
>>>>>> document results that is slowing the who thing down. Is there any

>>>>>> other ways that I can improve this. I've included a sample 
>>>>>> document (small), so you can see what I'm trying to achieve.
>>>>>>
>>>>>> <doc>
>>>>>>  <classifications>
>>>>>>   <classification value="123" />
>>>>>>   <classification value="324" />
>>>>>>  </classifications>
>>>>>>  <keywords>
>>>>>>   <keyword value="word1" />
>>>>>>   <keyword value="word2" />
>>>>>>
>>>>>>  </keywords>
>>>>>> </doc>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 12, 2008 at 2:24 PM, James Clippinger 
>>>>>> <[EMAIL PROTECTED]> wrote:
>>>>>>>
>>>>>>> Steve, your query is doing some heavyweight filtering for the 
>>>>>>> XPath because it is doing two steps:
>>>>>>>
>>>>>>> 1) Execute the cts:search(): generate a list of all documents 
>>>>>>> matching the query in relevance order.
>>>>>>>
>>>>>>> 2) Execute the XPath: reorder the documents into document order 
>>>>>>> and find only those with /doc/classifications/classification
>>>>>>> elements, returning those classification elements.
>>>>>>>
>>>>>>> Since you are using XPath and thus returning results in document

>>>>>>> order, you probably want to use cts:contains() in an XPath 
>>>>>>> predicate
>>>>
>>>>>>> rather than cts:search().  cts:contains() in a rooted XPath 
>>>>>>> expression will use the search indexes when appropriate, so it's

>>>>>>> as fast as the equivalent
>>>>>>> cts:search() expression.  Try this:
>>>>>>>
>>>>>>> let $search := cts:element-attribute-word-query(fn:QName("",
>>>>>>> "keyword"), fn:QName("", "value"), "something") return 
>>>>>>> fn:collection()/doc[cts:contains(.,
>>>>>>> $search)/classifications/classification
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: [EMAIL PROTECTED]
>>>>>>>> [mailto:[EMAIL PROTECTED] On Behalf Of 
>>>>>>>> Steve
>>>>>>>> Sent: Wednesday, November 12, 2008 8:54 AM
>>>>>>>> To: [email protected]
>>>>>>>> Subject: [MarkLogic Dev General] Improving XPath Performance on

>>>>>>>> SearchResults
>>>>>>>>
>>>>>>>> I've written a query which I use to search my data set and I am

>>>>>>>> able to get the results back very quickly. However the results 
>>>>>>>> that I get back show the complete document that the search 
>>>>>>>> matched, where as I
>>>>
>>>>>>>> want a particular node returned.
>>>>>>>> At the moment I'm doing this:
>>>>>>>>
>>>>>>>> let $search := cts:element-attribute-word-query(fn:QName("",
>>>>>>>> "keyword"), fn:QName("", "value"), "something") let $results :=

>>>>>>>> cts:search(fn:collection(),
> $search)/doc/classifications/classification
>>>>>>>>   return $results
>>>>>>>>
>>>>>>>> I've tried profiling this query and I've found that there is a 
>>>>>>>> big lag filtering the $results of the search using XPath.
>>>>>>>> Is there any way, either through using a different query or 
>>>>>>>> search notation, or by indexes etc that I can speed this up.
>>>>>>>>
>>>>>>>> Thanks in advance...
>>>>>>>> _______________________________________________
>>>>>>>> General mailing list
>>>>>>>> [email protected] 
>>>>>>>> http://xqzone.com/mailman/listinfo/general
>>>>>>>>
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://xqzone.com/mailman/listinfo/general
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://xqzone.com/mailman/listinfo/general
>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://xqzone.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://xqzone.com/mailman/listinfo/general
>>>
>>>
>>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to