Under the covers it looks like mlcp evaluates the document_selector as part of 
a FLWOR. So it won't use the URI lexicon, and probably doesn't need to because 
of the way mlcp works. Anyway if the supplied document_selector uses the 
indexes properly then it would be no less efficient than what you were doing 
with XQSync. It's also worth knowing that mlcp does *not* wrap the 
document_selector in parens, and it appends a predicate. So it's good practice 
to make sure your document_selector is safe in those circumstances.

When I check with xdmp:plan, the suggested document_selector doesn't use any 
indexes to speak of. That's because it starts in the main document but does all 
the index-sensitive work on the property fragment. MarkLogic can't optimize 
index lookups in that situation.

Let's try to fix that. Instead of starting in the main document and moving to 
the properties, we'll start in the properties and try to stay there. According 
to xdmp:plan this is a more efficient way to test the last-modified part:

    (xdmp:document-properties()/prop:properties[
       property::prop:last-modified
       gt xs:dateTime("2015-01-23T09:55:22-06:00")])

However that doesn't test the directory. This does both:

    cts:search(
      xdmp:document-properties(),
      cts:and-not-query(
        cts:element-range-query(
          xs:QName("prop:last-modified"), ">",
          xs:dateTime("2015-01-23T09:55:22-06:00")),
        cts:directory-query('/events/', 'infinity')))

According to xdmp:plan this should be optimal. The trick here is to understand 
that directory terms are indexed for both the main document and the property 
fragment. Because the last-modified term uses the property fragment, it's 
better to use the property fragment throughout. Also the and-not query should 
be a little more efficient than an and-query with a not-query term.

The next optimization idea might be to replace cts:search with cts:uris...

    cts:uris(
      (), 'properties',
      cts:and-not-query(
      cts:element-range-query(
        xs:QName("prop:last-modified"), ">",
        xs:dateTime("2015-01-23T09:55:22-06:00")),
      cts:directory-query('/events/', 'infinity')))

That would probably improve efficiency with XQSync and INPUT_QUERY. However 
with mlcp it won't work, and probably wouldn't help anyway because of the way 
mlcp does its processing.

-- Mike

> On 23 Jan 2015, at 09:21 , Danny Sinang <[email protected]> wrote:
> 
> Hi Kevin,
> 
> Thanks for the sample MLCP call.
> 
> Performance-wise though, won't the XPATH expression perform real slow 
> (compared to my XQSync INPUT_QUERY), especially if I have a dynamically 
> computed predicate (like when you compute for one day ago) ?
> 
> Will -document_selector accept FLWOR statements ?
> 
> Regards,
> Danny
> 
> 
> On Fri, Jan 23, 2015 at 11:58 AM, Kevin Ford <[email protected]> wrote:
> Hi Danny,
> 
> Try using the –document_selector option.  It seemed to do the trick in a 
> small test based on your use case and sample xquery. You’ll have to figure 
> out the $date-one-day-ago before invoking it.  
> 
> Example:
> 
> mlcp.sh export -host localhost -port 7003 -username kefo -password admin 
> -mode local -output_file_path export/ -document_selector 
> '/*:mets[property::prop:last-modified[.>xs:dateTime("2015-01-23T09:55:22-06:00")]]’
> 
> HTH,
> Kevin
> 
> 
> 
> From: Danny Sinang <[email protected]>
> Reply-To: MarkLogic Developer Discussion <[email protected]>
> Date: Thursday, January 22, 2015 at 9:26 AM
> To: general <[email protected]>
> Subject: [MarkLogic Dev General] Can MLCP use an input query ?
> 
> We're considering using MLCP to replace XQSync and it seems MLCP is up to the 
> job except that it doesn't accept an INPUT_QUERY parameter.
> 
> In XQSync, we use the following query to fetch all docs whose last-modified 
> property is older than 1 day.
> 
> INPUT_QUERY=declare variable $date-one-day-ago  := fn:current-dateTime() - 
> xs:dayTimeDuration("P1D"); for $d in cts:search(/, cts:and-query( ( 
> cts:not-query(cts:directory-query('/events/', 'infinity')), 
> cts:properties-query(cts:element-range-query(xs:QName("prop:last-modified"), 
> ">", $date-one-day-ago)) ) ) ) let $uri := xdmp:node-uri($d) order by $uri 
> return $uri
> 
> 
> But in MLCP, it appears the only way to selectively choose which docs to copy 
> would be by copying entire collections, directories, or by specifying an 
> XPATH expressions - none of which allow me to use a query.
> 
> Is there a workaround for this ?
> 
> Regards,
> Danny
> 
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to