Re: [MarkLogic Dev General] Fast order-by with multiple range indexes?

Gajanan Chinchwadkar Fri, 02 Aug 2013 12:46:11 -0700

In general I agree with Mike that you will be able to use an index of kindof 
/a/(b|c).


But creating a good path range index for 
$result/(old/path/date|new/path/m:sort-date) may not be easy. ML 6 doesn't 
allow you to create an index with a top level grouping operator, i.e. 
(old/path/date|new/path/m:sort-date). You can create an index of type say, 
indexroot/(old/path/date|new/path/m:sort-date). But then in order to get that 
index used by fast order by the "indexroot" must match against $result (in fact 
it should be a leaf of $result).

It will be a good idea for Ron to contact Stephen Buxton and submit an RFE.

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: Friday, August 02, 2013 10:49 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Fast order-by with multiple range indexes?

With ML6 I think you could create a single, useful path range index on both 
elements. There is a table of permitted XPath syntax at 
http://docs.marklogic.com/guide/admin/range_index#id_54948 and syntax like 
/a/(b|c) is supported.

So you're answer might be "this will be slow now, but we have a plan to make it 
faster when we upgrade".

-- Mike

On 2 Aug 2013, at 10:40 , Ron Hitchens <[email protected]> wrote:

> 
>   This one is on 5.x.  I plan to move them to 6.x for the next phase 
> (where everything goes to a consistent model), but it's not possible 
> now.
> 
>   If you know a way on 6.x I'd love to see it.  We can't use path 
> range indexes, but how would that work?  Something like this?:
> 
> order by xs:date ($result/(old/path/date|new/path/m:sort-date))
> 
> On Aug 2, 2013, at 6:13 PM, Michael Blakeley <[email protected]> wrote:
> 
>> Which release?
>> 
>> So you're using an element-range index? What about using a path-range index?
>> 
>> -- Mike
>> 
>> On 2 Aug 2013, at 09:26 , Ron Hitchens <[email protected]> wrote:
>> 
>>> 
>>> I have a sorting problem that I can't find a good solution for.  I'm 
>>> working on a project where a lot of content exists in one form which 
>>> was not designed for efficient searching or sorting.
>>> In fact, MarkLogic is not used at all for search at the moment, 
>>> that's what I'm adding.
>>> 
>>> This existing format has multiple versions of the content in each 
>>> document, with an element range index on an xs:date field.
>>> I can do efficient sorts on this content alone using the ranged date 
>>> field in an "order by" clause.
>>> 
>>> Here's the complication: a new type of content is being added in a 
>>> newer, more MarkLogic-friendly schema.  These documents all have a 
>>> common metadata section with a ranged date field.  Different name 
>>> and namespace, but serving the same purpose.
>>> 
>>> My problem is that I need to do searches across both types of 
>>> content and sort them together.  Searching one kind or the other and 
>>> sorting by their respective date fields works great for massive 
>>> result sets.  But doing them together blows the expanded tree cache 
>>> if the result set is large.
>>> 
>>> Because of the odd layout of the old content, my searchable 
>>> expression is rather funky and looks something like this:
>>> 
>>> cts:search 
>>> (fn:doc()/(/container/group[@state="live]/doc[fn:not(@foo)]|x:new1|x
>>> :new2), $q, "unfiltered")
>>> 
>>> Note that the first one returns a sub-element of the document, which 
>>> is actually a fragment root.  The other two on the end return root 
>>> elements.
>>> 
>>> A FLWOR like this doesn't work:
>>> 
>>> for $result in cts:search ( . . .)
>>> order by xs:date (($result/old/path/date, 
>>> $result/new/path/m:sort-date)[1]) return $result
>>> 
>>> It runs but ok and will do the right thing if the result set is 
>>> reasonably small (a few thousand) will blow the cache if there are 
>>> too many results.  Trying to ignore one of the dates also blows he 
>>> cache:
>>> 
>>> for $result in cts:search ( . . .)
>>> order by xs:date ($result/old/path/date) return $result
>>> 
>>> But removing the last two components of the XPath (|x:new1|x:new2) 
>>> will then run fast.  I'm not sure why this prevents the range index 
>>> from kicking in, probably because of the complexity of the XPath.
>>> 
>>> Sorting combined results by relevance in either direction is fast.
>>> 
>>> Does anyone have a voodoo trick to enable fast sorting using values 
>>> from two different range indexes?  I don't need to look into the 
>>> documents the get the sort keys, it seems like it shouldn't have to 
>>> consume expanded tree cache space for this.
>>> 
>>> ---
>>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>>   +44 7879 358 212 (voice)          http://www.ronsoft.com
>>>   +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>>> "No amount of belief establishes any fact." -Unknown
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Fast order-by with multiple range indexes?

Reply via email to