> By “which query” I mean which of the 125,000 separate query docs actually 
> matched for a given cts:reverse-query() call. 

cts:search(
  doc(),
  cts:reverse-query(doc("newdoc.xml"))
)

This will return all the docs containing any serialized queries which would 
match newdoc.xml.

> I guess my question is: in the case where the reverse query is applied to an 
> element that is not a full document, does the “brute force” have to be 
> applied for every candidate query or only for those that match containing 
> document of the input element? 

In general I avoid putting any xpath in the first arg.  In the JavaScript API 
it's not even possible, because it gives a false sense of optimization.

> If the brute force cost is applied to each query then doing a two-phase 
> search would be faster: determine which reverse queries apply to the input 
> document and then use those to find the elements within the input document 
> that actually matched. But if the brute force cost only applies to those 
> queries that match the containing doc then ML internally must produce the 
> faster result than doing it in my own code. 
> 
> But as you say, that calls into the question the use of reverse queries at 
> all: why not simply run the 125,000 forward queries and update each element 
> matched as appropriate?

Yep.  If it's a one-time batch job and you're trying to minimize the time then 
this would be faster, I bet.

> Or it may simply be that we need to do some horizontal scaling and invest in 
> additional D-nodes.

You're going to do this often?

-jh-

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to