Re: [MarkLogic Dev General] Fast order-by with multiple range indexes?

Michael Blakeley Fri, 02 Aug 2013 10:14:36 -0700

Which release?

So you're using an element-range index? What about using a path-range index?


-- Mike

On 2 Aug 2013, at 09:26 , Ron Hitchens <[email protected]> wrote:

> 
>   I have a sorting problem that I can't find a good solution
> for.  I'm working on a project where a lot of content exists in
> one form which was not designed for efficient searching or sorting.
> In fact, MarkLogic is not used at all for search at the moment,
> that's what I'm adding.
> 
>   This existing format has multiple versions of the content
> in each document, with an element range index on an xs:date field.
> I can do efficient sorts on this content alone using the ranged
> date field in an "order by" clause.
> 
>   Here's the complication: a new type of content is being added
> in a newer, more MarkLogic-friendly schema.  These documents all
> have a common metadata section with a ranged date field.  Different
> name and namespace, but serving the same purpose.
> 
>   My problem is that I need to do searches across both types of
> content and sort them together.  Searching one kind or the other
> and sorting by their respective date fields works great for massive
> result sets.  But doing them together blows the expanded tree cache
> if the result set is large.
> 
>  Because of the odd layout of the old content, my searchable
> expression is rather funky and looks something like this:
> 
> cts:search 
> (fn:doc()/(/container/group[@state="live]/doc[fn:not(@foo)]|x:new1|x:new2), 
> $q, "unfiltered")
> 
>   Note that the first one returns a sub-element of the document,
> which is actually a fragment root.  The other two on the end return
> root elements.
> 
>  A FLWOR like this doesn't work:
> 
> for $result in cts:search ( . . .)
> order by xs:date (($result/old/path/date, $result/new/path/m:sort-date)[1])
> return $result
> 
>   It runs but ok and will do the right thing if the result
> set is reasonably small (a few thousand) will blow the cache
> if there are too many results.  Trying to ignore one of the
> dates also blows he cache:
> 
> for $result in cts:search ( . . .)
> order by xs:date ($result/old/path/date)
> return $result
> 
>   But removing the last two components of the XPath (|x:new1|x:new2)
> will then run fast.  I'm not sure why this prevents the range index
> from kicking in, probably because of the complexity of the XPath.
> 
>   Sorting combined results by relevance in either direction is fast.
> 
>   Does anyone have a voodoo trick to enable fast sorting using values
> from two different range indexes?  I don't need to look into the documents
> the get the sort keys, it seems like it shouldn't have to consume expanded
> tree cache space for this.
> 
> ---
> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>     +44 7879 358 212 (voice)          http://www.ronsoft.com
>     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
> "No amount of belief establishes any fact." -Unknown
> 
> 
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Fast order-by with multiple range indexes?

Reply via email to