Re: [MarkLogic Dev General] Registered Query Best Practices

Ron Hitchens Wed, 31 Jul 2013 09:37:32 -0700

   So here's a little more color on this, if anyone is still
interested.  When I profile this code, where $query is a fairly
complex serialized query that was previously computed and stored
in a database:


declare variable $q1 := cts:registered-query (cts:register (cts:query 
($query)), "unfiltered");
 
cts:search (fn:doc(), $q1)[1 to 5]

The top two items on the profile output are:

Shallow%  Shallow usecs   Deep%  Deep usecs  Expression
 80        125000          90     140000      cts:query($query)
 10         16000         100     156000      cts:registered-query 
(cts:register (cts:query ($query)), "unfiltered")

   Time spent on the actual search is so small it rounded to zero.

   Doing this repeatedly yields similar timing, so it's not a cold
cache situation or anything like that.

   Profiling this:

declare variable $q2 := cts:registered-query (9156609332438599120, 
"unfiltered");
 
cts:search (fn:doc(), $q2)[1 to 5]

   Yields times too fast to measure (all rounded to zero)

   So, the potentially expensive to create query is being
built every time and possibly being re-registered as well,
given that cts:registered-query is taking a non-trivial amount
of time.

On Jul 31, 2013, at 8:38 AM, Ron Hitchens <[email protected]> wrote:

> 
>   The overall entitlement query on each request is composed
> of many sub-queries, some of which are static and registered,
> some of which are dependent on the current time.  But even the
> static ones are not finite, new ones can be created at any time
> as part of a new entitlement definition.
> 
>   I'm working on a scheme to catch and re-register all the
> static queries in a given query tree when a search fails due
> to a missing registration.  That should lazily re-register
> on first use after a server restart as well.
> 
> ---
> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>     +44 7879 358 212 (voice)          http://www.ronsoft.com
>     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
> "No amount of belief establishes any fact." -Unknown
> 
> 
> On Jul 30, 2013, at 8:30 PM, Geert Josten <[email protected]> wrote:
> 
>> Hi Ron,
>> 
>> Are your queries such that you would have a finite number of sub-queries,
>> if you would break them into smaller subparts? Perhaps you can combine
>> multiple registered queries..
>> 
>> Cheers,
>> Geert
>> 
>>> -----Oorspronkelijk bericht-----
>>> Van: [email protected] [mailto:general-
>>> [email protected]] Namens Ron Hitchens
>>> Verzonden: dinsdag 30 juli 2013 2:29
>>> Aan: MarkLogic Developer Discussion
>>> Onderwerp: Re: [MarkLogic Dev General] Registered Query Best Practices
>>> 
>>> 
>>> Hi Geert,
>>> 
>>>  I've done something before where we stored reg ids in a map for
>>> easy re-use.  In that case, there was a 1:1 correspondence between
>>> the reg id and a meaningful business domain number.  On this project
>>> that's not the case.
>>> 
>>>  Also, there is not a finite set of queries that need to be registered
>>> so it's not feasible to pre-register everything once.  New ones can be
>>> created
>>> dynamically.  And the complicated queries are persisted in another
>>> database
>>> and can be referenced later.  This means the queries which should be
>>> registered
>>> will persist across server restarts.  Which means there must be a way to
>>> register the queries on first use, then make use of those registered
>> queries
>>> on subsequent requests.
>>> 
>>>  The re-register-before-each-use pattern solves that nicely, but not
>> if
>>> the query construction cost must be re-paid each time.  It looks like
>> the
>>> robust solution is going to have to be catching exceptions for
>> unregistered
>>> queries and reconstructing the registrations.  It's a shame because that
>> is
>>> going to add unnecessary complexity to the code.
>>> 
>>> ---
>>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>>    +44 7879 358 212 (voice)          http://www.ronsoft.com
>>>    +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>>> "No amount of belief establishes any fact." -Unknown
>>> 
>>> 
>>> On Jul 29, 2013, at 8:15 PM, Geert Josten <[email protected]> wrote:
>>> 
>>>> Hi Ron,
>>>> 
>>>> I recently saw a strategy where they deliberately took a different
>>>> approach. In their case the calculation of the queries was not
>>>> straight-forward and could run into 30k search terms. Additionally,
>>>> registering the query, and warming up cache by doing one initial
>> search
>>>> after registering each query took most time. They were searching
>> roughly
>>>> 40mln docs. The searches themselves were subsec..
>>>> 
>>>> Their approach was to store all registered query id's somewhere, and
>> have
>>>> them readily available at actual search time. They also used a try
>> catch
>>>> to catch unregistered queries, though in their case they shouldn't
>>>> actually occur, and these dramatically pulled down the average on
>>>> performance tests.
>>>> 
>>>> How much chance is there that a query is unregistered, if you would
>>>> prepare all queries beforehand?
>>>> 
>>>> Cheers,
>>>> Geert
>>>> 
>>>>> -----Oorspronkelijk bericht-----
>>>>> Van: [email protected] [mailto:general-
>>>>> [email protected]] Namens Michael Blakeley
>>>>> Verzonden: maandag 29 juli 2013 21:08
>>>>> Aan: MarkLogic Developer Discussion
>>>>> Onderwerp: Re: [MarkLogic Dev General] Registered Query Best
>> Practices
>>>>> 
>>>>> I think you're using registered query as intended. That behavior
>> sounds
>>>> odd
>>>>> to me. I would expect (2) to be cheap, just a hash operation on the
>>>> query
>>>>> terms, and I would (3) to be the expensive step.
>>>>> 
>>>>> So I would contact support and see what they think.
>>>>> 
>>>>> -- Mike
>>>>> 
>>>>> On 29 Jul 2013, at 11:03 , Ron Hitchens <[email protected]> wrote:
>>>>> 
>>>>>> 
>>>>>> What is the best practice these days for using registered
>>>>>> queries?  I was under the impression that the pattern should be:
>>>>>> 
>>>>>> 1) Create your query:
>>>>>> $query := cts:and-query ((blah blah blah))
>>>>>> 2) Register it and make a registered query from it in one step:
>>>>>> $reg-query := cts:resistered-query (cts:register ($query),
>>>> "unfiltered")
>>>>>> 3) Use it in a search:
>>>>>> cts:search (fn:doc(), $reg-query)
>>>>>> 
>>>>>> The theory being that if the cts:query described by $query is
>>>>>> already registered, then the registration is essentially a no-op
>>>>>> and you'll get back the same ID.  And doing this every time insures
>>>>>> that if the registered query has been evicted for some reason then
>>>>>> it's re-registered and all is well.
>>>>>> 
>>>>>> It's a nice theory but seems to be based on the assumption that
>>>>>> creating a cts:query object is very cheap.  Unfortunately, I'm
>> finding
>>>>>> that this is often not the case, especially when there are lots of
>>>>>> documents in the database.  I have a test case where performing Step
>> 2
>>>>>> above on a moderately complicated query takes roughly 200ms every
>>>>> time.
>>>>>> Others take even longer and all seem to be proportional to database
>>>> size.
>>>>>> But running Step 3 with cts:registered-query(<regid>) is very, very
>>>>>> fast (~0ms).  Re-creating the query for re-registering every time is
>>>>>> destroying the benefit of using a registered query.
>>>>>> 
>>>>>> I can obviously save the registration ID obtained from calling
>>>>>> cts:register and then make a cts:registered-query each time, but
>> then
>>>>>> I'm not protected from the query becoming unregistered.  And there
>> is
>>>>>> no lightweight way to test if an ID is still registered.  The only
>> way
>>>>>> I know to make this robust is to put a loop and try/catch around the
>>>>>> code that does the search.  But that requires passing along enough
>>>>>> context to re-construct and re-register the queries (there can be
>>>>>> dozens of them in this case).  This is obviously a lot harder than
>>>>>> building the complex query in one module and then passing it along
>>>>>> to the search code somewhere else.
>>>>>> 
>>>>>> What's the generally accepted best usage pattern for registered
>>>>>> queries?  And is it my imagination or has the cost of running
>> queries
>>>>>> been moving from query evaluation into query construction?
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> ---
>>>>>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>>>>>  +44 7879 358 212 (voice)          http://www.ronsoft.com
>>>>>>  +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>>>>>> "No amount of belief establishes any fact." -Unknown
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> General mailing list
>>>>>> [email protected]
>>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Registered Query Best Practices

Reply via email to