Hi Mike,

Here are a few ideas.

I'm wondering if a straight filtering approach might work fairly well here.  I 
think the answer depends on how frequent your updates are, and how likely it is 
that an older version will have higher scores (ie, greater relevance) than the 
latest version.  What I was thinking is that you can, instead of getting the 
first 10 results, get the first 20 (or some such number) and then filter them 
with the XPath you have.  This might be pretty inexpensive to do, as you are 
only dealing with a small number of documents to filter.  The risk of this 
approach is that there might be a newer version later down your result 
sequence.  And it might be a bit trickier to paginate (but seems possible to 
me).

So I think it depends on what your nitty-gritty of your requirements are.

Another thought is to put a range index on doc/@version and see if you can 
construct a range query that can satisfy your constraint.  I can't off the top 
of my head think of a way to do this, as you want to take the max, not compare 
it to some value.  Maybe if you could identify some buckets of versions (for 
example, if you knew that new versions were only added once a week), then you 
can create a facet-like constraint, using cts:element-attribute-value-ranges, 
for example.

Of course, if you are putting a range index on version, then you can relatively 
easily write a batch job to add a collection (as Ron mentioned) or add another 
attribute called is-latest or something.  But it sounds like you do not want to 
touch your documents (yet).

-Danny

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Michael Sokolov
Sent: Monday, August 22, 2011 4:47 AM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] searching document versions

On 8/22/2011 2:34 AM, Ron Hitchens wrote:
>     Maybe collections could help here?  If the latest
> version of each document is in the "LATEST" collection,
> then it's simply:
>
>       cts:search (doc(), cts:and-query (cts:collection-query ("LATEST"), 
> $random-query))
>
>     You just need to make sure you manage collections
> on ingest, so that when a new version of a document is
> loaded, the previous latest version is removed from the
> "LATEST" collection.
Thanks Ron - yes, we could certainly do that, but the question was 
really about what the best you can do is if you don't allow any stored 
info.  For example, maybe some kind of grouping operation (group by 
document-id) followed by a selection (newest in the group) ... but I 
don't know how to do that in xquery, or whether that is possible.  Just 
wondering if there is some construct I'm unaware of that could help.

-Mike
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to