Thanks for the ideas, Danny.

I hear you about the pagination idea.  I think we will have to provide 
accurate counts for the entire result set anyway, which I don't think we 
could easily if we were playing client-side pagination tricks.

We are currently planning to insert is-latest data in each document.  
It's a little more complicated than simply tracking a single is-latest 
flag b/c of the requirement that different users get different slices of 
the data (so have different ideas about what's "latest"), but we do have 
a scheme for handling that in the data.

-Mike

On 08/22/2011 01:55 PM, Danny Sokolsky wrote:
> Hi Mike,
>
> Here are a few ideas.
>
> I'm wondering if a straight filtering approach might work fairly well here.  
> I think the answer depends on how frequent your updates are, and how likely 
> it is that an older version will have higher scores (ie, greater relevance) 
> than the latest version.  What I was thinking is that you can, instead of 
> getting the first 10 results, get the first 20 (or some such number) and then 
> filter them with the XPath you have.  This might be pretty inexpensive to do, 
> as you are only dealing with a small number of documents to filter.  The risk 
> of this approach is that there might be a newer version later down your 
> result sequence.  And it might be a bit trickier to paginate (but seems 
> possible to me).
>
> So I think it depends on what your nitty-gritty of your requirements are.
>
> Another thought is to put a range index on doc/@version and see if you can 
> construct a range query that can satisfy your constraint.  I can't off the 
> top of my head think of a way to do this, as you want to take the max, not 
> compare it to some value.  Maybe if you could identify some buckets of 
> versions (for example, if you knew that new versions were only added once a 
> week), then you can create a facet-like constraint, using 
> cts:element-attribute-value-ranges, for example.
>
> Of course, if you are putting a range index on version, then you can 
> relatively easily write a batch job to add a collection (as Ron mentioned) or 
> add another attribute called is-latest or something.  But it sounds like you 
> do not want to touch your documents (yet).
>
> -Danny
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Michael Sokolov
> Sent: Monday, August 22, 2011 4:47 AM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] searching document versions
>
> On 8/22/2011 2:34 AM, Ron Hitchens wrote:
>    
>>      Maybe collections could help here?  If the latest
>> version of each document is in the "LATEST" collection,
>> then it's simply:
>>
>>        cts:search (doc(), cts:and-query (cts:collection-query ("LATEST"), 
>> $random-query))
>>
>>      You just need to make sure you manage collections
>> on ingest, so that when a new version of a document is
>> loaded, the previous latest version is removed from the
>> "LATEST" collection.
>>      
> Thanks Ron - yes, we could certainly do that, but the question was
> really about what the best you can do is if you don't allow any stored
> info.  For example, maybe some kind of grouping operation (group by
> document-id) followed by a selection (newest in the group) ... but I
> don't know how to do that in xquery, or whether that is possible.  Just
> wondering if there is some construct I'm unaware of that could help.
>
> -Mike
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>    
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to