I've looked at the query and tried to execute it, but I get an error
telling me that the argument passed to cts:frequency(..) is not of
type item(). And looking at the query, won't that return the count of
<keyword> elements rather than the sum of the count attributes of each
distinct keyword element?
On Thu, Nov 13, 2008 at 10:22 AM, Whitby, Rob, CMG
<[EMAIL PROTECTED]> wrote:
> This is where an index really is useful..
>
> for $keyword in cts:element-attribute-values(xs:Qname("keyword",
> xs:Qname("value"))
> let $count := cts:frequency($keyword)
> order by $count descending
> return <keyword value="{$keyword}" count="{$count}"/>
>
>
>
>
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Steve
> Sent: 13 November 2008 10:13
> To: Michael Blakeley
> Cc: General Mark Logic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Improving XPathPerformance
> onSearchResults
>
> Sorry if I seem to keep moving the goalposts. I've been playing with
> some of the parameters, and experimenting with my XQuery and the best
> performance I can get is using the following query. Unfortunately, it
> still takes a considerable amount of time to execute. Profiling shows
> that the XPath $nodes/@count is taking the time.
>
> let $allKeywords := fn:collection()/doc/keywords/keyword/@value
> let $distinct := fn:distinct-values($allKeywords)
>
> for $k in $distinct
> return
> let $search := cts:element-attribute-word-query(fn:QName("",
> "keyword"), fn:QName("", "value"), $k)
> let $results := cts:search(fn:collection()/doc/keywords/keyword,
> $search)
> let $nodes := [EMAIL PROTECTED] eq $k]
> let $counts := $nodes/@count
> return
> <keyword value="{$k}" total="{fn:sum($counts)}" />
>
>
> On Thu, Nov 13, 2008 at 9:21 AM, Steve <[EMAIL PROTECTED]>
> wrote:
>> I've been applying what I've learnt so far from this thread, but I'm
>> having a bit of trouble getting good performance when I put it all
>> together. The query I'm trying to execute in order to get the sum of a
>
>> count of keywords is below:
>>
>> let $allKeywords := fn:collection()/doc/keywords/keyword/@value
>> let $distinct := fn:distinct-values($allKeywords)
>>
>> for $k in $distinct
>> return
>> let $search := cts:element-attribute-word-query(fn:QName("",
>> "keyword"), fn:QName("", "value"), $k)
>> let $results := cts:search(fn:collection()/doc/keywords/keyword,
>> $search)/@count
>> return
>> fn:sum($results)
>>
>> Running profiling on the query shows me that it's the XPath stuff I do
>
>> on the search results that's holding everything up, can anyone advise
>> how I can improve this?
>>
>> Thankl
>>
>> On Wed, Nov 12, 2008 at 7:24 PM, Michael Blakeley
>> <[EMAIL PROTECTED]> wrote:
>>> To be fair, absorbing the architecture and indexing behavior of a
>>> modern RDBMS isn't trivial either. XML content adds another
>>> dimension, but I hope you find the performance guide at
>>> http://developer.marklogic.com/pubs/4.0/
>>> helpful. There are also useful bits of server architecture discussion
>
>>> in the dev and admin guides.
>>>
>>> In the general case I wouldn't expect adding a range index to greatly
>
>>> improve value query performance. The list cache is pretty efficient
>>> at keeping frequently-used terms in memory.
>>>
>>> Usually the range indexes are created for applications that need
>>> particular
>>> features: fast sorting on a node value, fast range queries, fast
>>> access to distinct values, etc.
>>>
>>> -- Mike
>>>
>>> Whitby, Rob, CMG wrote:
>>>>
>>>> Wow, I didn't realise that. It will improve performance though
>>>> right? On a large database I assume the index of all XML elements
>>>> and attributes can't be held in memory.
>>>>
>>>> Understanding how the functions relate to the indexes is probably
>>>> one of areas I've found hardest with MarkLogic.
>>>>
>>>> Thanks
>>>> Rob
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: [EMAIL PROTECTED]
>>>> [mailto:[EMAIL PROTECTED] On Behalf Of
>>>> Michael Blakeley
>>>> Sent: 12 November 2008 16:55
>>>> To: General Mark Logic Developer Discussion
>>>> Cc: [EMAIL PROTECTED]
>>>> Subject: Re: [MarkLogic Dev General] Improving XPathPerformance
>>>> onSearchResults
>>>>
>>>> Actually that query does *not* require any special indexes. The
>>>> server always indexes all XML element values and element-attribute
> values.
>>>>
>>>> You would only need an attribute range index for a fast "order by"
>>>> on keyword/@value, or for a cts:attribute-value-range-query term, or
>
>>>> for
>>>> cts:element-attribute-values() and its associated functions.
>>>>
>>>> -- Mike
>>>>
>>>> Whitby, Rob, CMG wrote:
>>>>>
>>>>> If you put an attribute range index on keyword/@value you can do
>>>>> something like this:
>>>>>
>>>>> cts:search(
>>>>> /doc/classifications/classification,
>>>>> cts:element-attribute-value-query(xs:Qname("keyword",
>>>>> xs:Qname("value"), "something")
>>>>> )
>>>>>
>>>>> (untested!)
>>>>>
>>>>> Rob
>>>>>
>>>>> -----Original Message-----
>>>>> From: [EMAIL PROTECTED]
>>>>> [mailto:[EMAIL PROTECTED] On Behalf Of Steve
>>>>> Sent: 12 November 2008 14:41
>>>>> To: James Clippinger
>>>>> Cc: General Mark Logic Developer Discussion
>>>>> Subject: Re: [MarkLogic Dev General] Improving XPath Performance
>>>>> onSearchResults
>>>>>
>>>>> I should probably add that I'm trying to extract all classification
>
>>>>> values for the documents that have a specific keyword value.
>>>>>
>>>>> On Wed, Nov 12, 2008 at 2:40 PM, Steve <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>>
>>>>>> Thanks for your response.
>>>>>>
>>>>>> I've tried your suggestion and it doesn't really help. Looking at
>>>>>> the
>>>>
>>>>>> profiling document, I can see that it's clearly the XPath on the
>>>>>> document results that is slowing the who thing down. Is there any
>>>>>> other ways that I can improve this. I've included a sample
>>>>>> document (small), so you can see what I'm trying to achieve.
>>>>>>
>>>>>> <doc>
>>>>>> <classifications>
>>>>>> <classification value="123" />
>>>>>> <classification value="324" />
>>>>>> </classifications>
>>>>>> <keywords>
>>>>>> <keyword value="word1" />
>>>>>> <keyword value="word2" />
>>>>>>
>>>>>> </keywords>
>>>>>> </doc>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 12, 2008 at 2:24 PM, James Clippinger
>>>>>> <[EMAIL PROTECTED]> wrote:
>>>>>>>
>>>>>>> Steve, your query is doing some heavyweight filtering for the
>>>>>>> XPath because it is doing two steps:
>>>>>>>
>>>>>>> 1) Execute the cts:search(): generate a list of all documents
>>>>>>> matching the query in relevance order.
>>>>>>>
>>>>>>> 2) Execute the XPath: reorder the documents into document order
>>>>>>> and find only those with /doc/classifications/classification
>>>>>>> elements, returning those classification elements.
>>>>>>>
>>>>>>> Since you are using XPath and thus returning results in document
>>>>>>> order, you probably want to use cts:contains() in an XPath
>>>>>>> predicate
>>>>
>>>>>>> rather than cts:search(). cts:contains() in a rooted XPath
>>>>>>> expression will use the search indexes when appropriate, so it's
>>>>>>> as fast as the equivalent
>>>>>>> cts:search() expression. Try this:
>>>>>>>
>>>>>>> let $search := cts:element-attribute-word-query(fn:QName("",
>>>>>>> "keyword"), fn:QName("", "value"), "something") return
>>>>>>> fn:collection()/doc[cts:contains(.,
>>>>>>> $search)/classifications/classification
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: [EMAIL PROTECTED]
>>>>>>>> [mailto:[EMAIL PROTECTED] On Behalf Of
>>>>>>>> Steve
>>>>>>>> Sent: Wednesday, November 12, 2008 8:54 AM
>>>>>>>> To: [email protected]
>>>>>>>> Subject: [MarkLogic Dev General] Improving XPath Performance on
>>>>>>>> SearchResults
>>>>>>>>
>>>>>>>> I've written a query which I use to search my data set and I am
>>>>>>>> able to get the results back very quickly. However the results
>>>>>>>> that I get back show the complete document that the search
>>>>>>>> matched, where as I
>>>>
>>>>>>>> want a particular node returned.
>>>>>>>> At the moment I'm doing this:
>>>>>>>>
>>>>>>>> let $search := cts:element-attribute-word-query(fn:QName("",
>>>>>>>> "keyword"), fn:QName("", "value"), "something") let $results :=
>>>>>>>> cts:search(fn:collection(),
> $search)/doc/classifications/classification
>>>>>>>> return $results
>>>>>>>>
>>>>>>>> I've tried profiling this query and I've found that there is a
>>>>>>>> big lag filtering the $results of the search using XPath.
>>>>>>>> Is there any way, either through using a different query or
>>>>>>>> search notation, or by indexes etc that I can speed this up.
>>>>>>>>
>>>>>>>> Thanks in advance...
>>>>>>>> _______________________________________________
>>>>>>>> General mailing list
>>>>>>>> [email protected]
>>>>>>>> http://xqzone.com/mailman/listinfo/general
>>>>>>>>
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://xqzone.com/mailman/listinfo/general
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://xqzone.com/mailman/listinfo/general
>>>>
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://xqzone.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://xqzone.com/mailman/listinfo/general
>>>
>>>
>>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general