I have a sorting problem that I can't find a good solution
for.  I'm working on a project where a lot of content exists in
one form which was not designed for efficient searching or sorting.
In fact, MarkLogic is not used at all for search at the moment,
that's what I'm adding.

   This existing format has multiple versions of the content
in each document, with an element range index on an xs:date field.
I can do efficient sorts on this content alone using the ranged
date field in an "order by" clause.

   Here's the complication: a new type of content is being added
in a newer, more MarkLogic-friendly schema.  These documents all
have a common metadata section with a ranged date field.  Different
name and namespace, but serving the same purpose.

   My problem is that I need to do searches across both types of
content and sort them together.  Searching one kind or the other
and sorting by their respective date fields works great for massive
result sets.  But doing them together blows the expanded tree cache
if the result set is large.

  Because of the odd layout of the old content, my searchable
expression is rather funky and looks something like this:

cts:search 
(fn:doc()/(/container/group[@state="live]/doc[fn:not(@foo)]|x:new1|x:new2), $q, 
"unfiltered")

   Note that the first one returns a sub-element of the document,
which is actually a fragment root.  The other two on the end return
root elements.

  A FLWOR like this doesn't work:

for $result in cts:search ( . . .)
order by xs:date (($result/old/path/date, $result/new/path/m:sort-date)[1])
return $result

   It runs but ok and will do the right thing if the result
set is reasonably small (a few thousand) will blow the cache
if there are too many results.  Trying to ignore one of the
dates also blows he cache:

for $result in cts:search ( . . .)
order by xs:date ($result/old/path/date)
return $result

   But removing the last two components of the XPath (|x:new1|x:new2)
will then run fast.  I'm not sure why this prevents the range index
from kicking in, probably because of the complexity of the XPath.

   Sorting combined results by relevance in either direction is fast.

   Does anyone have a voodoo trick to enable fast sorting using values
from two different range indexes?  I don't need to look into the documents
the get the sort keys, it seems like it shouldn't have to consume expanded
tree cache space for this.

---
Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
     +44 7879 358 212 (voice)          http://www.ronsoft.com
     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
"No amount of belief establishes any fact." -Unknown




_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to