Re: [MarkLogic Dev General] Optimizing Reverse Queries

Eliot Kimber Mon, 01 May 2017 08:32:26 -0700

Here is a typical reverse query document. Others may be a bit more complex, for 
example, an OR query matching on text strings or doing a cts word search:


<?xml version="1.0" encoding="UTF-8"?>
<MatchingQuery preferredTerm="c15fc" component="true" rule-based="true" >
    <cts:or-query xmlns:cts="http://marklogic.com/cts";>
        <cts:or-query>
            <cts:word-query>
                <cts:text xml:lang="en">c15fc</cts:text>
                <cts:option>case-insensitive</cts:option>
                <cts:option>diacritic-insensitive</cts:option>
                <cts:option>punctuation-insensitive</cts:option>
                <cts:option>whitespace-insensitive</cts:option>
                <cts:option>unstemmed</cts:option>
                <cts:option>wildcarded</cts:option>
            </cts:word-query>
        </cts:or-query>
    </cts:or-query>
</MatchingQuery>

The processing is, for each document to be processed, examine on the order of 
10-20 elements to see if they match the reverse query by getting the node to be 
looked up and then doing:

cts:search(cts:reverse-query($node))

The initial profiling we did was just taking one source document and applying 
the process that then uses these reverse queries (that is, we haven’t yet had a 
chance to profile a larger run of documents).

I’m just starting my performance analysis here, but I don’t have any experience 
with reverse queries so I mostly just wanted to make sure that there wasn’t 
something fairly obvious that I might look for as a source of slowness before 
digging into things more deeply. I’m pretty sure I’ll have to do deeper 
profiling to see where the time is really being taken—strong possibility that 
it’s in our code and not really the reverse queries. 

Cheers,

Eliot

--
Eliot Kimber
http://contrext.com
 


On 5/1/17, 10:00 AM, "Jason Hunter" <[email protected] on 
behalf of [email protected]> wrote:

    On May 1, 2017, at 20:45, Eliot Kimber <[email protected]> wrote:
    > 
    > Using ML 8 we have an application that relies on reverse queries. The 
overall application is not performing as well as we need it to and our initial 
attempts at profiling show that the reverse queries are taking most of the 
time. We have about 120,000 separate reverse query documents. 
    
    What kind of reverse queries are they?  Text?  Geo?  Simple?  Complex?
    
    > The “Inside MarkLogic” document suggests that reverse queries, properly 
indexed, should be quite fast. I have verified that we have the “fast reverse 
queries” index turned on.
    > 
    > My question: What should I look for that might be causing our reverse 
queries to not be optimized?
    
    What are you doing with them?  Looping against 1,000 documents?  Sample 
code will help us all understand.
    
    How fast are they running exactly?
    
    How fast do you need them to run?
    
    > Are there any other ML settings or server configurations that might 
affect reverse query performance? Are there particular query patterns that 
might be suboptimal? Is there a way that I can confirm that the reverse queries 
are performing as fast as possible?
    
    The xdmp:plan function is your friend.
    
    -jh-
    
    _______________________________________________
    General mailing list
    [email protected]
    Manage your subscription at: 
    http://developer.marklogic.com/mailman/listinfo/general
    


_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Optimizing Reverse Queries

Reply via email to