Thanks for the ideas, Danny. I hear you about the pagination idea. I think we will have to provide accurate counts for the entire result set anyway, which I don't think we could easily if we were playing client-side pagination tricks.
We are currently planning to insert is-latest data in each document. It's a little more complicated than simply tracking a single is-latest flag b/c of the requirement that different users get different slices of the data (so have different ideas about what's "latest"), but we do have a scheme for handling that in the data. -Mike On 08/22/2011 01:55 PM, Danny Sokolsky wrote: > Hi Mike, > > Here are a few ideas. > > I'm wondering if a straight filtering approach might work fairly well here. > I think the answer depends on how frequent your updates are, and how likely > it is that an older version will have higher scores (ie, greater relevance) > than the latest version. What I was thinking is that you can, instead of > getting the first 10 results, get the first 20 (or some such number) and then > filter them with the XPath you have. This might be pretty inexpensive to do, > as you are only dealing with a small number of documents to filter. The risk > of this approach is that there might be a newer version later down your > result sequence. And it might be a bit trickier to paginate (but seems > possible to me). > > So I think it depends on what your nitty-gritty of your requirements are. > > Another thought is to put a range index on doc/@version and see if you can > construct a range query that can satisfy your constraint. I can't off the > top of my head think of a way to do this, as you want to take the max, not > compare it to some value. Maybe if you could identify some buckets of > versions (for example, if you knew that new versions were only added once a > week), then you can create a facet-like constraint, using > cts:element-attribute-value-ranges, for example. > > Of course, if you are putting a range index on version, then you can > relatively easily write a batch job to add a collection (as Ron mentioned) or > add another attribute called is-latest or something. But it sounds like you > do not want to touch your documents (yet). > > -Danny > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Michael Sokolov > Sent: Monday, August 22, 2011 4:47 AM > To: General MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] searching document versions > > On 8/22/2011 2:34 AM, Ron Hitchens wrote: > >> Maybe collections could help here? If the latest >> version of each document is in the "LATEST" collection, >> then it's simply: >> >> cts:search (doc(), cts:and-query (cts:collection-query ("LATEST"), >> $random-query)) >> >> You just need to make sure you manage collections >> on ingest, so that when a new version of a document is >> loaded, the previous latest version is removed from the >> "LATEST" collection. >> > Thanks Ron - yes, we could certainly do that, but the question was > really about what the best you can do is if you don't allow any stored > info. For example, maybe some kind of grouping operation (group by > document-id) followed by a selection (newest in the group) ... but I > don't know how to do that in xquery, or whether that is possible. Just > wondering if there is some construct I'm unaware of that could help. > > -Mike > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
