Under the covers it looks like mlcp evaluates the document_selector as part of
a FLWOR. So it won't use the URI lexicon, and probably doesn't need to because
of the way mlcp works. Anyway if the supplied document_selector uses the
indexes properly then it would be no less efficient than what you were doing
with XQSync. It's also worth knowing that mlcp does *not* wrap the
document_selector in parens, and it appends a predicate. So it's good practice
to make sure your document_selector is safe in those circumstances.
When I check with xdmp:plan, the suggested document_selector doesn't use any
indexes to speak of. That's because it starts in the main document but does all
the index-sensitive work on the property fragment. MarkLogic can't optimize
index lookups in that situation.
Let's try to fix that. Instead of starting in the main document and moving to
the properties, we'll start in the properties and try to stay there. According
to xdmp:plan this is a more efficient way to test the last-modified part:
(xdmp:document-properties()/prop:properties[
property::prop:last-modified
gt xs:dateTime("2015-01-23T09:55:22-06:00")])
However that doesn't test the directory. This does both:
cts:search(
xdmp:document-properties(),
cts:and-not-query(
cts:element-range-query(
xs:QName("prop:last-modified"), ">",
xs:dateTime("2015-01-23T09:55:22-06:00")),
cts:directory-query('/events/', 'infinity')))
According to xdmp:plan this should be optimal. The trick here is to understand
that directory terms are indexed for both the main document and the property
fragment. Because the last-modified term uses the property fragment, it's
better to use the property fragment throughout. Also the and-not query should
be a little more efficient than an and-query with a not-query term.
The next optimization idea might be to replace cts:search with cts:uris...
cts:uris(
(), 'properties',
cts:and-not-query(
cts:element-range-query(
xs:QName("prop:last-modified"), ">",
xs:dateTime("2015-01-23T09:55:22-06:00")),
cts:directory-query('/events/', 'infinity')))
That would probably improve efficiency with XQSync and INPUT_QUERY. However
with mlcp it won't work, and probably wouldn't help anyway because of the way
mlcp does its processing.
-- Mike
> On 23 Jan 2015, at 09:21 , Danny Sinang <[email protected]> wrote:
>
> Hi Kevin,
>
> Thanks for the sample MLCP call.
>
> Performance-wise though, won't the XPATH expression perform real slow
> (compared to my XQSync INPUT_QUERY), especially if I have a dynamically
> computed predicate (like when you compute for one day ago) ?
>
> Will -document_selector accept FLWOR statements ?
>
> Regards,
> Danny
>
>
> On Fri, Jan 23, 2015 at 11:58 AM, Kevin Ford <[email protected]> wrote:
> Hi Danny,
>
> Try using the –document_selector option. It seemed to do the trick in a
> small test based on your use case and sample xquery. You’ll have to figure
> out the $date-one-day-ago before invoking it.
>
> Example:
>
> mlcp.sh export -host localhost -port 7003 -username kefo -password admin
> -mode local -output_file_path export/ -document_selector
> '/*:mets[property::prop:last-modified[.>xs:dateTime("2015-01-23T09:55:22-06:00")]]’
>
> HTH,
> Kevin
>
>
>
> From: Danny Sinang <[email protected]>
> Reply-To: MarkLogic Developer Discussion <[email protected]>
> Date: Thursday, January 22, 2015 at 9:26 AM
> To: general <[email protected]>
> Subject: [MarkLogic Dev General] Can MLCP use an input query ?
>
> We're considering using MLCP to replace XQSync and it seems MLCP is up to the
> job except that it doesn't accept an INPUT_QUERY parameter.
>
> In XQSync, we use the following query to fetch all docs whose last-modified
> property is older than 1 day.
>
> INPUT_QUERY=declare variable $date-one-day-ago := fn:current-dateTime() -
> xs:dayTimeDuration("P1D"); for $d in cts:search(/, cts:and-query( (
> cts:not-query(cts:directory-query('/events/', 'infinity')),
> cts:properties-query(cts:element-range-query(xs:QName("prop:last-modified"),
> ">", $date-one-day-ago)) ) ) ) let $uri := xdmp:node-uri($d) order by $uri
> return $uri
>
>
> But in MLCP, it appears the only way to selectively choose which docs to copy
> would be by copying entire collections, directories, or by specifying an
> XPATH expressions - none of which allow me to use a query.
>
> Is there a workaround for this ?
>
> Regards,
> Danny
>
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general