Re: [MarkLogic Dev General] Registered Query Best Practices

Michael Blakeley Wed, 31 Jul 2013 11:20:25 -0700

If that profile is correct, I'd be much more worried about the cts:query 
constructor. I have a hard time getting that much elapsed time out of a 
cts:query constructor.


declare variable $q := cts:and-query(
  (1 to 1000) ! cts:element-range-query(
    xs:QName('does-not-exist'), '=',
    xdmp:integer-to-hex(xdmp:random()))) ;

prof:eval('
  declare variable $qnode as element() external ;
  cts:query($qnode)',
  (xs:QName('qnode'),
   document { $q }/*))

On my laptop, the profile shows cts:query at 99% of shallow, with anywhere from 
5-15 ms total. And that's with 1000 terms, which seems like a lot. But I'm 
testing against a nearly empty database, which might make a difference.

Is cts:query still a hotspot if you drop the registered-query code entirely?

Is there a particular cts:query term type that triggers this?

Does xdmp:query-meters() show anything indicating database lookups?

-- Mike

On 31 Jul 2013, at 09:37 , Ron Hitchens <[email protected]> wrote:

> 
>   So here's a little more color on this, if anyone is still
> interested.  When I profile this code, where $query is a fairly
> complex serialized query that was previously computed and stored
> in a database:
> 
> declare variable $q1 := cts:registered-query (cts:register (cts:query 
> ($query)), "unfiltered");
> 
> cts:search (fn:doc(), $q1)[1 to 5]
> 
> The top two items on the profile output are:
> 
> Shallow%  Shallow usecs   Deep%  Deep usecs  Expression
> 80        125000          90     140000      cts:query($query)
> 10         16000         100     156000      cts:registered-query 
> (cts:register (cts:query ($query)), "unfiltered")
> 
>   Time spent on the actual search is so small it rounded to zero.
> 
>   Doing this repeatedly yields similar timing, so it's not a cold
> cache situation or anything like that.
> 
>   Profiling this:
> 
> declare variable $q2 := cts:registered-query (9156609332438599120, 
> "unfiltered");
> 
> cts:search (fn:doc(), $q2)[1 to 5]
> 
>   Yields times too fast to measure (all rounded to zero)
> 
>   So, the potentially expensive to create query is being
> built every time and possibly being re-registered as well,
> given that cts:registered-query is taking a non-trivial amount
> of time.
> 
> On Jul 31, 2013, at 8:38 AM, Ron Hitchens <[email protected]> wrote:
> 
>> 
>>  The overall entitlement query on each request is composed
>> of many sub-queries, some of which are static and registered,
>> some of which are dependent on the current time.  But even the
>> static ones are not finite, new ones can be created at any time
>> as part of a new entitlement definition.
>> 
>>  I'm working on a scheme to catch and re-register all the
>> static queries in a given query tree when a search fails due
>> to a missing registration.  That should lazily re-register
>> on first use after a server restart as well.
>> 
>> ---
>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>    +44 7879 358 212 (voice)          http://www.ronsoft.com
>>    +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>> "No amount of belief establishes any fact." -Unknown
>> 
>> 
>> On Jul 30, 2013, at 8:30 PM, Geert Josten <[email protected]> wrote:
>> 
>>> Hi Ron,
>>> 
>>> Are your queries such that you would have a finite number of sub-queries,
>>> if you would break them into smaller subparts? Perhaps you can combine
>>> multiple registered queries..
>>> 
>>> Cheers,
>>> Geert
>>> 
>>>> -----Oorspronkelijk bericht-----
>>>> Van: [email protected] [mailto:general-
>>>> [email protected]] Namens Ron Hitchens
>>>> Verzonden: dinsdag 30 juli 2013 2:29
>>>> Aan: MarkLogic Developer Discussion
>>>> Onderwerp: Re: [MarkLogic Dev General] Registered Query Best Practices
>>>> 
>>>> 
>>>> Hi Geert,
>>>> 
>>>> I've done something before where we stored reg ids in a map for
>>>> easy re-use.  In that case, there was a 1:1 correspondence between
>>>> the reg id and a meaningful business domain number.  On this project
>>>> that's not the case.
>>>> 
>>>> Also, there is not a finite set of queries that need to be registered
>>>> so it's not feasible to pre-register everything once.  New ones can be
>>>> created
>>>> dynamically.  And the complicated queries are persisted in another
>>>> database
>>>> and can be referenced later.  This means the queries which should be
>>>> registered
>>>> will persist across server restarts.  Which means there must be a way to
>>>> register the queries on first use, then make use of those registered
>>> queries
>>>> on subsequent requests.
>>>> 
>>>> The re-register-before-each-use pattern solves that nicely, but not
>>> if
>>>> the query construction cost must be re-paid each time.  It looks like
>>> the
>>>> robust solution is going to have to be catching exceptions for
>>> unregistered
>>>> queries and reconstructing the registrations.  It's a shame because that
>>> is
>>>> going to add unnecessary complexity to the code.
>>>> 
>>>> ---
>>>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>>>   +44 7879 358 212 (voice)          http://www.ronsoft.com
>>>>   +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>>>> "No amount of belief establishes any fact." -Unknown
>>>> 
>>>> 
>>>> On Jul 29, 2013, at 8:15 PM, Geert Josten <[email protected]> wrote:
>>>> 
>>>>> Hi Ron,
>>>>> 
>>>>> I recently saw a strategy where they deliberately took a different
>>>>> approach. In their case the calculation of the queries was not
>>>>> straight-forward and could run into 30k search terms. Additionally,
>>>>> registering the query, and warming up cache by doing one initial
>>> search
>>>>> after registering each query took most time. They were searching
>>> roughly
>>>>> 40mln docs. The searches themselves were subsec..
>>>>> 
>>>>> Their approach was to store all registered query id's somewhere, and
>>> have
>>>>> them readily available at actual search time. They also used a try
>>> catch
>>>>> to catch unregistered queries, though in their case they shouldn't
>>>>> actually occur, and these dramatically pulled down the average on
>>>>> performance tests.
>>>>> 
>>>>> How much chance is there that a query is unregistered, if you would
>>>>> prepare all queries beforehand?
>>>>> 
>>>>> Cheers,
>>>>> Geert
>>>>> 
>>>>>> -----Oorspronkelijk bericht-----
>>>>>> Van: [email protected] [mailto:general-
>>>>>> [email protected]] Namens Michael Blakeley
>>>>>> Verzonden: maandag 29 juli 2013 21:08
>>>>>> Aan: MarkLogic Developer Discussion
>>>>>> Onderwerp: Re: [MarkLogic Dev General] Registered Query Best
>>> Practices
>>>>>> 
>>>>>> I think you're using registered query as intended. That behavior
>>> sounds
>>>>> odd
>>>>>> to me. I would expect (2) to be cheap, just a hash operation on the
>>>>> query
>>>>>> terms, and I would (3) to be the expensive step.
>>>>>> 
>>>>>> So I would contact support and see what they think.
>>>>>> 
>>>>>> -- Mike
>>>>>> 
>>>>>> On 29 Jul 2013, at 11:03 , Ron Hitchens <[email protected]> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> What is the best practice these days for using registered
>>>>>>> queries?  I was under the impression that the pattern should be:
>>>>>>> 
>>>>>>> 1) Create your query:
>>>>>>> $query := cts:and-query ((blah blah blah))
>>>>>>> 2) Register it and make a registered query from it in one step:
>>>>>>> $reg-query := cts:resistered-query (cts:register ($query),
>>>>> "unfiltered")
>>>>>>> 3) Use it in a search:
>>>>>>> cts:search (fn:doc(), $reg-query)
>>>>>>> 
>>>>>>> The theory being that if the cts:query described by $query is
>>>>>>> already registered, then the registration is essentially a no-op
>>>>>>> and you'll get back the same ID.  And doing this every time insures
>>>>>>> that if the registered query has been evicted for some reason then
>>>>>>> it's re-registered and all is well.
>>>>>>> 
>>>>>>> It's a nice theory but seems to be based on the assumption that
>>>>>>> creating a cts:query object is very cheap.  Unfortunately, I'm
>>> finding
>>>>>>> that this is often not the case, especially when there are lots of
>>>>>>> documents in the database.  I have a test case where performing Step
>>> 2
>>>>>>> above on a moderately complicated query takes roughly 200ms every
>>>>>> time.
>>>>>>> Others take even longer and all seem to be proportional to database
>>>>> size.
>>>>>>> But running Step 3 with cts:registered-query(<regid>) is very, very
>>>>>>> fast (~0ms).  Re-creating the query for re-registering every time is
>>>>>>> destroying the benefit of using a registered query.
>>>>>>> 
>>>>>>> I can obviously save the registration ID obtained from calling
>>>>>>> cts:register and then make a cts:registered-query each time, but
>>> then
>>>>>>> I'm not protected from the query becoming unregistered.  And there
>>> is
>>>>>>> no lightweight way to test if an ID is still registered.  The only
>>> way
>>>>>>> I know to make this robust is to put a loop and try/catch around the
>>>>>>> code that does the search.  But that requires passing along enough
>>>>>>> context to re-construct and re-register the queries (there can be
>>>>>>> dozens of them in this case).  This is obviously a lot harder than
>>>>>>> building the complex query in one module and then passing it along
>>>>>>> to the search code somewhere else.
>>>>>>> 
>>>>>>> What's the generally accepted best usage pattern for registered
>>>>>>> queries?  And is it my imagination or has the cost of running
>>> queries
>>>>>>> been moving from query evaluation into query construction?
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> ---
>>>>>>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>>>>>> +44 7879 358 212 (voice)          http://www.ronsoft.com
>>>>>>> +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>>>>>>> "No amount of belief establishes any fact." -Unknown
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> General mailing list
>>>>>>> [email protected]
>>>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> General mailing list
>>>>>> [email protected]
>>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Registered Query Best Practices

Reply via email to