Hi Mike, Here are a few ideas.
I'm wondering if a straight filtering approach might work fairly well here. I think the answer depends on how frequent your updates are, and how likely it is that an older version will have higher scores (ie, greater relevance) than the latest version. What I was thinking is that you can, instead of getting the first 10 results, get the first 20 (or some such number) and then filter them with the XPath you have. This might be pretty inexpensive to do, as you are only dealing with a small number of documents to filter. The risk of this approach is that there might be a newer version later down your result sequence. And it might be a bit trickier to paginate (but seems possible to me). So I think it depends on what your nitty-gritty of your requirements are. Another thought is to put a range index on doc/@version and see if you can construct a range query that can satisfy your constraint. I can't off the top of my head think of a way to do this, as you want to take the max, not compare it to some value. Maybe if you could identify some buckets of versions (for example, if you knew that new versions were only added once a week), then you can create a facet-like constraint, using cts:element-attribute-value-ranges, for example. Of course, if you are putting a range index on version, then you can relatively easily write a batch job to add a collection (as Ron mentioned) or add another attribute called is-latest or something. But it sounds like you do not want to touch your documents (yet). -Danny -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Sokolov Sent: Monday, August 22, 2011 4:47 AM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] searching document versions On 8/22/2011 2:34 AM, Ron Hitchens wrote: > Maybe collections could help here? If the latest > version of each document is in the "LATEST" collection, > then it's simply: > > cts:search (doc(), cts:and-query (cts:collection-query ("LATEST"), > $random-query)) > > You just need to make sure you manage collections > on ingest, so that when a new version of a document is > loaded, the previous latest version is removed from the > "LATEST" collection. Thanks Ron - yes, we could certainly do that, but the question was really about what the best you can do is if you don't allow any stored info. For example, maybe some kind of grouping operation (group by document-id) followed by a selection (newest in the group) ... but I don't know how to do that in xquery, or whether that is possible. Just wondering if there is some construct I'm unaware of that could help. -Mike _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
