Re: [MarkLogic Dev General] Fast order-by with multiple range indexes?

Will Sawyer Fri, 02 Aug 2013 12:06:17 -0700

I am having troubles with this same thing and the path range index is not being 
used in the order by.


I have a scheduled publish date and an actual published date.  I created 
dateTime path index on 
"//(ldse:schedule-publish/@dateTime|ldse:publish-date/@date)"
But the following query still throws an expanded tree cache error because it's 
returning every document.  Running ML 6.0-3.2

(
        for $item in cts:search(fn:collection(), cts:and-query(()), 
"unfiltered")
        order by 
$item//(ldse:schedule-publish/@dateTime|ldse:publish-date/@date) descending
        return (
            $item
        )
)[1 to 25]

-Will


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: Friday, August 02, 2013 11:49 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Fast order-by with multiple range indexes?

With ML6 I think you could create a single, useful path range index on both 
elements. There is a table of permitted XPath syntax at 
http://docs.marklogic.com/guide/admin/range_index#id_54948 and syntax like 
/a/(b|c) is supported.

So you're answer might be "this will be slow now, but we have a plan to make it 
faster when we upgrade".

-- Mike

On 2 Aug 2013, at 10:40 , Ron Hitchens <[email protected]> wrote:

> 
>   This one is on 5.x.  I plan to move them to 6.x for the next phase 
> (where everything goes to a consistent model), but it's not possible 
> now.
> 
>   If you know a way on 6.x I'd love to see it.  We can't use path 
> range indexes, but how would that work?  Something like this?:
> 
> order by xs:date ($result/(old/path/date|new/path/m:sort-date))
> 
> On Aug 2, 2013, at 6:13 PM, Michael Blakeley <[email protected]> wrote:
> 
>> Which release?
>> 
>> So you're using an element-range index? What about using a path-range index?
>> 
>> -- Mike
>> 
>> On 2 Aug 2013, at 09:26 , Ron Hitchens <[email protected]> wrote:
>> 
>>> 
>>> I have a sorting problem that I can't find a good solution for.  I'm 
>>> working on a project where a lot of content exists in one form which 
>>> was not designed for efficient searching or sorting.
>>> In fact, MarkLogic is not used at all for search at the moment, 
>>> that's what I'm adding.
>>> 
>>> This existing format has multiple versions of the content in each 
>>> document, with an element range index on an xs:date field.
>>> I can do efficient sorts on this content alone using the ranged date 
>>> field in an "order by" clause.
>>> 
>>> Here's the complication: a new type of content is being added in a 
>>> newer, more MarkLogic-friendly schema.  These documents all have a 
>>> common metadata section with a ranged date field.  Different name 
>>> and namespace, but serving the same purpose.
>>> 
>>> My problem is that I need to do searches across both types of 
>>> content and sort them together.  Searching one kind or the other and 
>>> sorting by their respective date fields works great for massive 
>>> result sets.  But doing them together blows the expanded tree cache 
>>> if the result set is large.
>>> 
>>> Because of the odd layout of the old content, my searchable 
>>> expression is rather funky and looks something like this:
>>> 
>>> cts:search 
>>> (fn:doc()/(/container/group[@state="live]/doc[fn:not(@foo)]|x:new1|x
>>> :new2), $q, "unfiltered")
>>> 
>>> Note that the first one returns a sub-element of the document, which 
>>> is actually a fragment root.  The other two on the end return root 
>>> elements.
>>> 
>>> A FLWOR like this doesn't work:
>>> 
>>> for $result in cts:search ( . . .)
>>> order by xs:date (($result/old/path/date, 
>>> $result/new/path/m:sort-date)[1]) return $result
>>> 
>>> It runs but ok and will do the right thing if the result set is 
>>> reasonably small (a few thousand) will blow the cache if there are 
>>> too many results.  Trying to ignore one of the dates also blows he 
>>> cache:
>>> 
>>> for $result in cts:search ( . . .)
>>> order by xs:date ($result/old/path/date) return $result
>>> 
>>> But removing the last two components of the XPath (|x:new1|x:new2) 
>>> will then run fast.  I'm not sure why this prevents the range index 
>>> from kicking in, probably because of the complexity of the XPath.
>>> 
>>> Sorting combined results by relevance in either direction is fast.
>>> 
>>> Does anyone have a voodoo trick to enable fast sorting using values 
>>> from two different range indexes?  I don't need to look into the 
>>> documents the get the sort keys, it seems like it shouldn't have to 
>>> consume expanded tree cache space for this.
>>> 
>>> ---
>>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>>   +44 7879 358 212 (voice)          http://www.ronsoft.com
>>>   +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>>> "No amount of belief establishes any fact." -Unknown
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general


 NOTICE: This email message is for the sole use of the intended recipient(s) 
and may contain confidential and privileged information. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Fast order-by with multiple range indexes?

Reply via email to