Re: [MarkLogic Dev General] Registered Query Best Practices

Ron Hitchens Wed, 31 Jul 2013 00:39:09 -0700

   The overall entitlement query on each request is composed
of many sub-queries, some of which are static and registered,
some of which are dependent on the current time.  But even the
static ones are not finite, new ones can be created at any time
as part of a new entitlement definition.


   I'm working on a scheme to catch and re-register all the
static queries in a given query tree when a search fails due
to a missing registration.  That should lazily re-register
on first use after a server restart as well.

---
Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
     +44 7879 358 212 (voice)          http://www.ronsoft.com
     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
"No amount of belief establishes any fact." -Unknown


On Jul 30, 2013, at 8:30 PM, Geert Josten <[email protected]> wrote:

> Hi Ron,
> 
> Are your queries such that you would have a finite number of sub-queries,
> if you would break them into smaller subparts? Perhaps you can combine
> multiple registered queries..
> 
> Cheers,
> Geert
> 
>> -----Oorspronkelijk bericht-----
>> Van: [email protected] [mailto:general-
>> [email protected]] Namens Ron Hitchens
>> Verzonden: dinsdag 30 juli 2013 2:29
>> Aan: MarkLogic Developer Discussion
>> Onderwerp: Re: [MarkLogic Dev General] Registered Query Best Practices
>> 
>> 
>> Hi Geert,
>> 
>>   I've done something before where we stored reg ids in a map for
>> easy re-use.  In that case, there was a 1:1 correspondence between
>> the reg id and a meaningful business domain number.  On this project
>> that's not the case.
>> 
>>   Also, there is not a finite set of queries that need to be registered
>> so it's not feasible to pre-register everything once.  New ones can be
>> created
>> dynamically.  And the complicated queries are persisted in another
>> database
>> and can be referenced later.  This means the queries which should be
>> registered
>> will persist across server restarts.  Which means there must be a way to
>> register the queries on first use, then make use of those registered
> queries
>> on subsequent requests.
>> 
>>   The re-register-before-each-use pattern solves that nicely, but not
> if
>> the query construction cost must be re-paid each time.  It looks like
> the
>> robust solution is going to have to be catching exceptions for
> unregistered
>> queries and reconstructing the registrations.  It's a shame because that
> is
>> going to add unnecessary complexity to the code.
>> 
>> ---
>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>     +44 7879 358 212 (voice)          http://www.ronsoft.com
>>     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>> "No amount of belief establishes any fact." -Unknown
>> 
>> 
>> On Jul 29, 2013, at 8:15 PM, Geert Josten <[email protected]> wrote:
>> 
>>> Hi Ron,
>>> 
>>> I recently saw a strategy where they deliberately took a different
>>> approach. In their case the calculation of the queries was not
>>> straight-forward and could run into 30k search terms. Additionally,
>>> registering the query, and warming up cache by doing one initial
> search
>>> after registering each query took most time. They were searching
> roughly
>>> 40mln docs. The searches themselves were subsec..
>>> 
>>> Their approach was to store all registered query id's somewhere, and
> have
>>> them readily available at actual search time. They also used a try
> catch
>>> to catch unregistered queries, though in their case they shouldn't
>>> actually occur, and these dramatically pulled down the average on
>>> performance tests.
>>> 
>>> How much chance is there that a query is unregistered, if you would
>>> prepare all queries beforehand?
>>> 
>>> Cheers,
>>> Geert
>>> 
>>>> -----Oorspronkelijk bericht-----
>>>> Van: [email protected] [mailto:general-
>>>> [email protected]] Namens Michael Blakeley
>>>> Verzonden: maandag 29 juli 2013 21:08
>>>> Aan: MarkLogic Developer Discussion
>>>> Onderwerp: Re: [MarkLogic Dev General] Registered Query Best
> Practices
>>>> 
>>>> I think you're using registered query as intended. That behavior
> sounds
>>> odd
>>>> to me. I would expect (2) to be cheap, just a hash operation on the
>>> query
>>>> terms, and I would (3) to be the expensive step.
>>>> 
>>>> So I would contact support and see what they think.
>>>> 
>>>> -- Mike
>>>> 
>>>> On 29 Jul 2013, at 11:03 , Ron Hitchens <[email protected]> wrote:
>>>> 
>>>>> 
>>>>> What is the best practice these days for using registered
>>>>> queries?  I was under the impression that the pattern should be:
>>>>> 
>>>>> 1) Create your query:
>>>>>  $query := cts:and-query ((blah blah blah))
>>>>> 2) Register it and make a registered query from it in one step:
>>>>>  $reg-query := cts:resistered-query (cts:register ($query),
>>> "unfiltered")
>>>>> 3) Use it in a search:
>>>>>  cts:search (fn:doc(), $reg-query)
>>>>> 
>>>>> The theory being that if the cts:query described by $query is
>>>>> already registered, then the registration is essentially a no-op
>>>>> and you'll get back the same ID.  And doing this every time insures
>>>>> that if the registered query has been evicted for some reason then
>>>>> it's re-registered and all is well.
>>>>> 
>>>>> It's a nice theory but seems to be based on the assumption that
>>>>> creating a cts:query object is very cheap.  Unfortunately, I'm
> finding
>>>>> that this is often not the case, especially when there are lots of
>>>>> documents in the database.  I have a test case where performing Step
> 2
>>>>> above on a moderately complicated query takes roughly 200ms every
>>>> time.
>>>>> Others take even longer and all seem to be proportional to database
>>> size.
>>>>> But running Step 3 with cts:registered-query(<regid>) is very, very
>>>>> fast (~0ms).  Re-creating the query for re-registering every time is
>>>>> destroying the benefit of using a registered query.
>>>>> 
>>>>> I can obviously save the registration ID obtained from calling
>>>>> cts:register and then make a cts:registered-query each time, but
> then
>>>>> I'm not protected from the query becoming unregistered.  And there
> is
>>>>> no lightweight way to test if an ID is still registered.  The only
> way
>>>>> I know to make this robust is to put a loop and try/catch around the
>>>>> code that does the search.  But that requires passing along enough
>>>>> context to re-construct and re-register the queries (there can be
>>>>> dozens of them in this case).  This is obviously a lot harder than
>>>>> building the complex query in one module and then passing it along
>>>>> to the search code somewhere else.
>>>>> 
>>>>> What's the generally accepted best usage pattern for registered
>>>>> queries?  And is it my imagination or has the cost of running
> queries
>>>>> been moving from query evaluation into query construction?
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> ---
>>>>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>>>>   +44 7879 358 212 (voice)          http://www.ronsoft.com
>>>>>   +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>>>>> "No amount of belief establishes any fact." -Unknown
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Registered Query Best Practices

Reply via email to