This is where an index really is useful..
for $keyword in cts:element-attribute-values(xs:Qname("keyword",
xs:Qname("value"))
let $count := cts:frequency($keyword)
order by $count descending
return <keyword value="{$keyword}" count="{$count}"/>
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steve
Sent: 13 November 2008 10:13
To: Michael Blakeley
Cc: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Improving XPathPerformance
onSearchResults
Sorry if I seem to keep moving the goalposts. I've been playing with
some of the parameters, and experimenting with my XQuery and the best
performance I can get is using the following query. Unfortunately, it
still takes a considerable amount of time to execute. Profiling shows
that the XPath $nodes/@count is taking the time.
let $allKeywords := fn:collection()/doc/keywords/keyword/@value
let $distinct := fn:distinct-values($allKeywords)
for $k in $distinct
return
let $search := cts:element-attribute-word-query(fn:QName("",
"keyword"), fn:QName("", "value"), $k)
let $results := cts:search(fn:collection()/doc/keywords/keyword,
$search)
let $nodes := [EMAIL PROTECTED] eq $k]
let $counts := $nodes/@count
return
<keyword value="{$k}" total="{fn:sum($counts)}" />
On Thu, Nov 13, 2008 at 9:21 AM, Steve <[EMAIL PROTECTED]>
wrote:
> I've been applying what I've learnt so far from this thread, but I'm
> having a bit of trouble getting good performance when I put it all
> together. The query I'm trying to execute in order to get the sum of a
> count of keywords is below:
>
> let $allKeywords := fn:collection()/doc/keywords/keyword/@value
> let $distinct := fn:distinct-values($allKeywords)
>
> for $k in $distinct
> return
> let $search := cts:element-attribute-word-query(fn:QName("",
> "keyword"), fn:QName("", "value"), $k)
> let $results := cts:search(fn:collection()/doc/keywords/keyword,
> $search)/@count
> return
> fn:sum($results)
>
> Running profiling on the query shows me that it's the XPath stuff I do
> on the search results that's holding everything up, can anyone advise
> how I can improve this?
>
> Thankl
>
> On Wed, Nov 12, 2008 at 7:24 PM, Michael Blakeley
> <[EMAIL PROTECTED]> wrote:
>> To be fair, absorbing the architecture and indexing behavior of a
>> modern RDBMS isn't trivial either. XML content adds another
>> dimension, but I hope you find the performance guide at
>> http://developer.marklogic.com/pubs/4.0/
>> helpful. There are also useful bits of server architecture discussion
>> in the dev and admin guides.
>>
>> In the general case I wouldn't expect adding a range index to greatly
>> improve value query performance. The list cache is pretty efficient
>> at keeping frequently-used terms in memory.
>>
>> Usually the range indexes are created for applications that need
>> particular
>> features: fast sorting on a node value, fast range queries, fast
>> access to distinct values, etc.
>>
>> -- Mike
>>
>> Whitby, Rob, CMG wrote:
>>>
>>> Wow, I didn't realise that. It will improve performance though
>>> right? On a large database I assume the index of all XML elements
>>> and attributes can't be held in memory.
>>>
>>> Understanding how the functions relate to the indexes is probably
>>> one of areas I've found hardest with MarkLogic.
>>>
>>> Thanks
>>> Rob
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: [EMAIL PROTECTED]
>>> [mailto:[EMAIL PROTECTED] On Behalf Of
>>> Michael Blakeley
>>> Sent: 12 November 2008 16:55
>>> To: General Mark Logic Developer Discussion
>>> Cc: [EMAIL PROTECTED]
>>> Subject: Re: [MarkLogic Dev General] Improving XPathPerformance
>>> onSearchResults
>>>
>>> Actually that query does *not* require any special indexes. The
>>> server always indexes all XML element values and element-attribute
values.
>>>
>>> You would only need an attribute range index for a fast "order by"
>>> on keyword/@value, or for a cts:attribute-value-range-query term, or
>>> for
>>> cts:element-attribute-values() and its associated functions.
>>>
>>> -- Mike
>>>
>>> Whitby, Rob, CMG wrote:
>>>>
>>>> If you put an attribute range index on keyword/@value you can do
>>>> something like this:
>>>>
>>>> cts:search(
>>>> /doc/classifications/classification,
>>>> cts:element-attribute-value-query(xs:Qname("keyword",
>>>> xs:Qname("value"), "something")
>>>> )
>>>>
>>>> (untested!)
>>>>
>>>> Rob
>>>>
>>>> -----Original Message-----
>>>> From: [EMAIL PROTECTED]
>>>> [mailto:[EMAIL PROTECTED] On Behalf Of Steve
>>>> Sent: 12 November 2008 14:41
>>>> To: James Clippinger
>>>> Cc: General Mark Logic Developer Discussion
>>>> Subject: Re: [MarkLogic Dev General] Improving XPath Performance
>>>> onSearchResults
>>>>
>>>> I should probably add that I'm trying to extract all classification
>>>> values for the documents that have a specific keyword value.
>>>>
>>>> On Wed, Nov 12, 2008 at 2:40 PM, Steve <[EMAIL PROTECTED]>
>>>> wrote:
>>>>>
>>>>> Thanks for your response.
>>>>>
>>>>> I've tried your suggestion and it doesn't really help. Looking at
>>>>> the
>>>
>>>>> profiling document, I can see that it's clearly the XPath on the
>>>>> document results that is slowing the who thing down. Is there any
>>>>> other ways that I can improve this. I've included a sample
>>>>> document (small), so you can see what I'm trying to achieve.
>>>>>
>>>>> <doc>
>>>>> <classifications>
>>>>> <classification value="123" />
>>>>> <classification value="324" />
>>>>> </classifications>
>>>>> <keywords>
>>>>> <keyword value="word1" />
>>>>> <keyword value="word2" />
>>>>>
>>>>> </keywords>
>>>>> </doc>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 12, 2008 at 2:24 PM, James Clippinger
>>>>> <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>> Steve, your query is doing some heavyweight filtering for the
>>>>>> XPath because it is doing two steps:
>>>>>>
>>>>>> 1) Execute the cts:search(): generate a list of all documents
>>>>>> matching the query in relevance order.
>>>>>>
>>>>>> 2) Execute the XPath: reorder the documents into document order
>>>>>> and find only those with /doc/classifications/classification
>>>>>> elements, returning those classification elements.
>>>>>>
>>>>>> Since you are using XPath and thus returning results in document
>>>>>> order, you probably want to use cts:contains() in an XPath
>>>>>> predicate
>>>
>>>>>> rather than cts:search(). cts:contains() in a rooted XPath
>>>>>> expression will use the search indexes when appropriate, so it's
>>>>>> as fast as the equivalent
>>>>>> cts:search() expression. Try this:
>>>>>>
>>>>>> let $search := cts:element-attribute-word-query(fn:QName("",
>>>>>> "keyword"), fn:QName("", "value"), "something") return
>>>>>> fn:collection()/doc[cts:contains(.,
>>>>>> $search)/classifications/classification
>>>>>>
>>>>>> James
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: [EMAIL PROTECTED]
>>>>>>> [mailto:[EMAIL PROTECTED] On Behalf Of
>>>>>>> Steve
>>>>>>> Sent: Wednesday, November 12, 2008 8:54 AM
>>>>>>> To: [email protected]
>>>>>>> Subject: [MarkLogic Dev General] Improving XPath Performance on
>>>>>>> SearchResults
>>>>>>>
>>>>>>> I've written a query which I use to search my data set and I am
>>>>>>> able to get the results back very quickly. However the results
>>>>>>> that I get back show the complete document that the search
>>>>>>> matched, where as I
>>>
>>>>>>> want a particular node returned.
>>>>>>> At the moment I'm doing this:
>>>>>>>
>>>>>>> let $search := cts:element-attribute-word-query(fn:QName("",
>>>>>>> "keyword"), fn:QName("", "value"), "something") let $results :=
>>>>>>> cts:search(fn:collection(),
$search)/doc/classifications/classification
>>>>>>> return $results
>>>>>>>
>>>>>>> I've tried profiling this query and I've found that there is a
>>>>>>> big lag filtering the $results of the search using XPath.
>>>>>>> Is there any way, either through using a different query or
>>>>>>> search notation, or by indexes etc that I can speed this up.
>>>>>>>
>>>>>>> Thanks in advance...
>>>>>>> _______________________________________________
>>>>>>> General mailing list
>>>>>>> [email protected]
>>>>>>> http://xqzone.com/mailman/listinfo/general
>>>>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://xqzone.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://xqzone.com/mailman/listinfo/general
>>>
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://xqzone.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://xqzone.com/mailman/listinfo/general
>>
>>
>
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general