If there are no search terms then documents will appear in native database 
order, also called "document order". This is something like RDBMS "row order". 
Generally speaking that won't be the same as the insertion order.

So how can we get the most recent documents in a reliable way? If you have 
maintain-last-modified enabled for the database, every document has a property 
fragment with a prop:last-modified element. The docs guide 
https://docs.marklogic.com/guide/app-dev/properties talks about this feature, 
and https://docs.marklogic.com/admin-help/database describes the database 
configuration. Note that enabling it won't affect documents inserted 
previously. You'll have to reinsert them or update them, and then they'll get a 
prop:last-modified timestamp as of that latest insert or update.

When sorting or querying on prop:last-modified, you'll want it to be fast. Per 
https://docs.marklogic.com/guide/performance/order_by that will be most 
efficient with an element range index. But watch out: the last-modified 
property isn't part of the main document fragment, and sorting can't use range 
index data from a different fragment.

So we have to sort the property fragments by prop:last-modified first. Then we 
can do other things with the results. Let's try that.

First, create an element range index of type dateTime on 
{http://marklogic.com/xdmp/property}last-modified. See 
https://docs.marklogic.com/admin-help/range-element-index and 
https://docs.marklogic.com/guide/admin/range_index for more on that topic.

Next we'll need some test content. Here's a query to insert 10 documents with 
different timestamps.

(: insert test documents :)
for $i in 1 to 10
let $_ := xdmp:invoke-function(
  function() {
    let $id := xdmp:integer-to-hex(xdmp:random())
    return xdmp:document-insert(
      '/test/'||$id,
      element test { attribute id { $id } }),
    xdmp:commit() },
  <options xmlns="xdmp:eval">
    <transaction-mode>update</transaction-mode>
  </options>)
let $_ := xdmp:sleep(1000)
return $i
=> 1 2 3 4 5 6 7 8 9 10

That query uses some ML7 features, but you should be able to port it to ML5 
without too much trouble. The '||' operator is like concat, or more precisely 
the Java '+' operator. You can use an xdmp:eval instead of the invoke-function 
magic, and you shouldn't need the xdmp:commit. The important bit is the 
xdmp:sleep call between sub-transactions, which ensures that each document has 
a different prop:last-modified. Let's check that.

xdmp:document-properties()/prop:properties/prop:last-modified/data(.)
=>
2014-08-08T08:43:39-07:00
2014-08-08T08:43:43-07:00
2014-08-08T08:43:44-07:00
2014-08-08T08:43:38-07:00
2014-08-08T08:43:47-07:00
2014-08-08T08:43:45-07:00
2014-08-08T08:43:46-07:00
2014-08-08T08:43:42-07:00
2014-08-08T08:43:40-07:00
2014-08-08T08:43:41-07:00

These timestamps are in document order, which doesn't match the original insert 
order. In fact it looks random. But we can still get the most recent N 
documents using prop:last-modified.

xdmp:query-trace(true()),
let $count := 5
let $start := 1
let $stop := $start + $count - 1
return (
  for $p in xdmp:document-properties()
  order by $p/prop:properties/prop:last-modified descending
  return text {
    xdmp:node-uri($p),
    $p/prop:properties/prop:last-modified })[$start to $stop]
=>
/test/8819ad493f97c9dd 2014-08-08T08:43:47-07:00
/test/bb8f23b3fc0446f7 2014-08-08T08:43:46-07:00
/test/b33af6f2becf4262 2014-08-08T08:43:45-07:00
/test/95fa44068813646 2014-08-08T08:43:44-07:00
/test/b1e91203787593ad 2014-08-08T08:43:43-07:00

Remember that we generated the ids and URIs with xdmp:random, so yours will be 
different. Production code probably wouldn't include that xdmp:query-trace, but 
it lets us see if the database really used the prop:last-modified range index. 
The query trace output appears in ErrorLog.txt:

Analyzing path for $p: xdmp:document-properties()
Step 1 is searchable: xdmp:document-properties()
Path is fully searchable.
Gathering constraints.
Step 1 contributed 1 constraint: xdmp:document-properties()
Order by clause contributed 1 range ordering constraint for $p: order by 
$p/prop:properties/prop:last-modified descending
Executing search.
Selected 10 fragments to filter.

That "Order by clause..." line tells us that sorting used the range index. So 
I'd expect this query to be efficient and to scale well as the database grows.

Now we know how to fetch the N most recent URIs quickly. We could also query 
for URIs before a certain dateTime, using that same range index on 
prop:last-modified.

for $p in cts:search(
  xdmp:document-properties(),
  cts:element-range-query(xs:QName('prop:last-modified'),
  '>', xs:dateTime('2014-08-08T08:43:42-07:00')))
order by $p/prop:properties/prop:last-modified descending
return text {
  xdmp:node-uri($p),
  $p/prop:properties/prop:last-modified }
=>
/test/8819ad493f97c9dd 2014-08-08T08:43:47-07:00
/test/bb8f23b3fc0446f7 2014-08-08T08:43:46-07:00
/test/b33af6f2becf4262 2014-08-08T08:43:45-07:00
/test/95fa44068813646 2014-08-08T08:43:44-07:00
/test/b1e91203787593ad 2014-08-08T08:43:43-07:00

This time the 'order by' wasn't strictly necessary, but it makes the results 
easier to read.

Now that we know how to get the URIs from recently modified documents, we might 
want the original documents. That's pretty easy, and the technique is the same 
with either query. Keep in mind that the extra fn:doc call adds an O(n) factor 
to the query. So fetch the main document if you need to, but don't do it if the 
URI alone is enough.

for $p in cts:search(
  xdmp:document-properties(),
  cts:element-range-query(xs:QName('prop:last-modified'),
  '>', xs:dateTime('2014-08-08T08:43:42-07:00')))
order by $p/prop:properties/prop:last-modified descending
return doc(xdmp:node-uri($p))
=>
<test id="8819ad493f97c9dd"/>
<test id="bb8f23b3fc0446f7"/>
<test id="b33af6f2becf4262"/>
<test id="95fa44068813646"/>
<test id="b1e91203787593ad"/>

You could use the same technique in the "N most recent" version of the query, 
too.

-- Mike

On 8 Aug 2014, at 06:04 , Chad Bishop <[email protected]> wrote:

> Greetings,
>  
> Is there any built-in functionality to retrieve the most recently  added 
> documents to a collection or directory?
>  
> It looks like a blank search does the trick, but would prefer something more 
> efficient.
>  
> We’re still on ML 5.
>  
> Thanks much,
>  
> -Chad
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to