Re: [MarkLogic Dev General] Registered Query Best Practices

Geert Josten Tue, 30 Jul 2013 12:30:43 -0700

Hi Ron,

Are your queries such that you would have a finite number of sub-queries,
if you would break them into smaller subparts? Perhaps you can combine
multiple registered queries..


Cheers,
Geert

> -----Oorspronkelijk bericht-----
> Van: [email protected] [mailto:general-
> [email protected]] Namens Ron Hitchens
> Verzonden: dinsdag 30 juli 2013 2:29
> Aan: MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] Registered Query Best Practices
>
>
> Hi Geert,
>
>    I've done something before where we stored reg ids in a map for
> easy re-use.  In that case, there was a 1:1 correspondence between
> the reg id and a meaningful business domain number.  On this project
> that's not the case.
>
>    Also, there is not a finite set of queries that need to be registered
> so it's not feasible to pre-register everything once.  New ones can be
> created
> dynamically.  And the complicated queries are persisted in another
> database
> and can be referenced later.  This means the queries which should be
> registered
> will persist across server restarts.  Which means there must be a way to
> register the queries on first use, then make use of those registered
queries
> on subsequent requests.
>
>    The re-register-before-each-use pattern solves that nicely, but not
if
> the query construction cost must be re-paid each time.  It looks like
the
> robust solution is going to have to be catching exceptions for
unregistered
> queries and reconstructing the registrations.  It's a shame because that
is
> going to add unnecessary complexity to the code.
>
> ---
> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>      +44 7879 358 212 (voice)          http://www.ronsoft.com
>      +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
> "No amount of belief establishes any fact." -Unknown
>
>
> On Jul 29, 2013, at 8:15 PM, Geert Josten <[email protected]> wrote:
>
> > Hi Ron,
> >
> > I recently saw a strategy where they deliberately took a different
> > approach. In their case the calculation of the queries was not
> > straight-forward and could run into 30k search terms. Additionally,
> > registering the query, and warming up cache by doing one initial
search
> > after registering each query took most time. They were searching
roughly
> > 40mln docs. The searches themselves were subsec..
> >
> > Their approach was to store all registered query id's somewhere, and
have
> > them readily available at actual search time. They also used a try
catch
> > to catch unregistered queries, though in their case they shouldn't
> > actually occur, and these dramatically pulled down the average on
> > performance tests.
> >
> > How much chance is there that a query is unregistered, if you would
> > prepare all queries beforehand?
> >
> > Cheers,
> > Geert
> >
> >> -----Oorspronkelijk bericht-----
> >> Van: [email protected] [mailto:general-
> >> [email protected]] Namens Michael Blakeley
> >> Verzonden: maandag 29 juli 2013 21:08
> >> Aan: MarkLogic Developer Discussion
> >> Onderwerp: Re: [MarkLogic Dev General] Registered Query Best
Practices
> >>
> >> I think you're using registered query as intended. That behavior
sounds
> > odd
> >> to me. I would expect (2) to be cheap, just a hash operation on the
> > query
> >> terms, and I would (3) to be the expensive step.
> >>
> >> So I would contact support and see what they think.
> >>
> >> -- Mike
> >>
> >> On 29 Jul 2013, at 11:03 , Ron Hitchens <[email protected]> wrote:
> >>
> >>>
> >>>  What is the best practice these days for using registered
> >>> queries?  I was under the impression that the pattern should be:
> >>>
> >>> 1) Create your query:
> >>>   $query := cts:and-query ((blah blah blah))
> >>> 2) Register it and make a registered query from it in one step:
> >>>   $reg-query := cts:resistered-query (cts:register ($query),
> > "unfiltered")
> >>> 3) Use it in a search:
> >>>   cts:search (fn:doc(), $reg-query)
> >>>
> >>>  The theory being that if the cts:query described by $query is
> >>> already registered, then the registration is essentially a no-op
> >>> and you'll get back the same ID.  And doing this every time insures
> >>> that if the registered query has been evicted for some reason then
> >>> it's re-registered and all is well.
> >>>
> >>>  It's a nice theory but seems to be based on the assumption that
> >>> creating a cts:query object is very cheap.  Unfortunately, I'm
finding
> >>> that this is often not the case, especially when there are lots of
> >>> documents in the database.  I have a test case where performing Step
2
> >>> above on a moderately complicated query takes roughly 200ms every
> >> time.
> >>> Others take even longer and all seem to be proportional to database
> > size.
> >>> But running Step 3 with cts:registered-query(<regid>) is very, very
> >>> fast (~0ms).  Re-creating the query for re-registering every time is
> >>> destroying the benefit of using a registered query.
> >>>
> >>>  I can obviously save the registration ID obtained from calling
> >>> cts:register and then make a cts:registered-query each time, but
then
> >>> I'm not protected from the query becoming unregistered.  And there
is
> >>> no lightweight way to test if an ID is still registered.  The only
way
> >>> I know to make this robust is to put a loop and try/catch around the
> >>> code that does the search.  But that requires passing along enough
> >>> context to re-construct and re-register the queries (there can be
> >>> dozens of them in this case).  This is obviously a lot harder than
> >>> building the complex query in one module and then passing it along
> >>> to the search code somewhere else.
> >>>
> >>>  What's the generally accepted best usage pattern for registered
> >>> queries?  And is it my imagination or has the cost of running
queries
> >>> been moving from query evaluation into query construction?
> >>>
> >>>  Thanks.
> >>>
> >>> ---
> >>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
> >>>    +44 7879 358 212 (voice)          http://www.ronsoft.com
> >>>    +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
> >>> "No amount of belief establishes any fact." -Unknown
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> General mailing list
> >>> [email protected]
> >>> http://developer.marklogic.com/mailman/listinfo/general
> >>>
> >>
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://developer.marklogic.com/mailman/listinfo/general
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Registered Query Best Practices

Reply via email to