Here is a typical reverse query document. Others may be a bit more complex, for
example, an OR query matching on text strings or doing a cts word search:
<?xml version="1.0" encoding="UTF-8"?>
<MatchingQuery preferredTerm="c15fc" component="true" rule-based="true" >
<cts:or-query xmlns:cts="http://marklogic.com/cts">
<cts:or-query>
<cts:word-query>
<cts:text xml:lang="en">c15fc</cts:text>
<cts:option>case-insensitive</cts:option>
<cts:option>diacritic-insensitive</cts:option>
<cts:option>punctuation-insensitive</cts:option>
<cts:option>whitespace-insensitive</cts:option>
<cts:option>unstemmed</cts:option>
<cts:option>wildcarded</cts:option>
</cts:word-query>
</cts:or-query>
</cts:or-query>
</MatchingQuery>
The processing is, for each document to be processed, examine on the order of
10-20 elements to see if they match the reverse query by getting the node to be
looked up and then doing:
cts:search(cts:reverse-query($node))
The initial profiling we did was just taking one source document and applying
the process that then uses these reverse queries (that is, we haven’t yet had a
chance to profile a larger run of documents).
I’m just starting my performance analysis here, but I don’t have any experience
with reverse queries so I mostly just wanted to make sure that there wasn’t
something fairly obvious that I might look for as a source of slowness before
digging into things more deeply. I’m pretty sure I’ll have to do deeper
profiling to see where the time is really being taken—strong possibility that
it’s in our code and not really the reverse queries.
Cheers,
Eliot
--
Eliot Kimber
http://contrext.com
On 5/1/17, 10:00 AM, "Jason Hunter" <[email protected] on
behalf of [email protected]> wrote:
On May 1, 2017, at 20:45, Eliot Kimber <[email protected]> wrote:
>
> Using ML 8 we have an application that relies on reverse queries. The
overall application is not performing as well as we need it to and our initial
attempts at profiling show that the reverse queries are taking most of the
time. We have about 120,000 separate reverse query documents.
What kind of reverse queries are they? Text? Geo? Simple? Complex?
> The “Inside MarkLogic” document suggests that reverse queries, properly
indexed, should be quite fast. I have verified that we have the “fast reverse
queries” index turned on.
>
> My question: What should I look for that might be causing our reverse
queries to not be optimized?
What are you doing with them? Looping against 1,000 documents? Sample
code will help us all understand.
How fast are they running exactly?
How fast do you need them to run?
> Are there any other ML settings or server configurations that might
affect reverse query performance? Are there particular query patterns that
might be suboptimal? Is there a way that I can confirm that the reverse queries
are performing as fast as possible?
The xdmp:plan function is your friend.
-jh-
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general