The time is being spent in the construction of the cts:query.  Actually
reification of the query, apparently.  The first thing that uses the query will
incur the construction penalty for it.  That could be cts:search, xdml:plan,
cts:register, etc.  In this case it looks like cts:register is causing the full
query to be constructed every time.  Some queries apparently are very expensive
to reify (like mine).

   If I do this, it will take 200ms every time, all day long:

let $query := cts:and-query ((blah blah blah))
let $reg-query := cts:resistered-query (cts:register ($query), "unfiltered")
return cts:search (fn:doc(), $reg-query)

   But if I do this once, it takes 200ms:

cts:register ($query)
=>
9850570453574503534957423057

   And then later do this as many times as I like:

let $reg-query := cts:resistered-query (9850570453574503534957423057)
return cts:search (fn:doc(), $reg-query)

   It always runs fast (xdmp:elapsed-time reports 0ms).

   I think what's happening here is that the cts:query is being reified
on reference by the cts:register function before it checks to see if a query
with the same fingerprint already exists.  It's that reification which takes
the 200ms and is thus being paid every time I re-register query, thus defeating
any benefit of using the registered query.

   I have run into variations on this issue several times on various
projects, both on ML 5 and 6.  I can't believe no one else has run up
against it.  On a previous project we had to implement a registered query
id caching scheme because registering the queries took so long.  On my
current project that's not so easy because things are structured somewhat
differently.

   Any advice from ML engineering on what to do when re-registering a previously
registered query is prohibitively expensive?  Has anyone else had to deal with
this problem?

---
Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
     +44 7879 358 212 (voice)          http://www.ronsoft.com
     +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
"No amount of belief establishes any fact." -Unknown


On Jul 29, 2013, at 8:35 PM, David Lee <[email protected]> wrote:

> Its possible your profile is misleading.  Many things during runtime are 
> deferred until evaluation so, I am guessing here, 
> but possibly the time you are seeing is not the query registration time but 
> rather its final instantiation when the query is used.
> 
> -----------------------------------------------------------------------------
> David Lee
> Lead Engineer
> MarkLogic Corporation
> [email protected]
> Phone: +1 812-482-5224
> Cell:  +1 812-630-7622
> www.marklogic.com
> 
> 
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Michael Blakeley
> Sent: Monday, July 29, 2013 3:08 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Registered Query Best Practices
> 
> I think you're using registered query as intended. That behavior sounds odd 
> to me. I would expect (2) to be cheap, just a hash operation on the query 
> terms, and I would (3) to be the expensive step.
> 
> So I would contact support and see what they think.
> 
> -- Mike
> 
> On 29 Jul 2013, at 11:03 , Ron Hitchens <[email protected]> wrote:
> 
>> 
>>  What is the best practice these days for using registered
>> queries?  I was under the impression that the pattern should be:
>> 
>> 1) Create your query:
>>   $query := cts:and-query ((blah blah blah))
>> 2) Register it and make a registered query from it in one step:
>>   $reg-query := cts:resistered-query (cts:register ($query), "unfiltered")
>> 3) Use it in a search:
>>   cts:search (fn:doc(), $reg-query)
>> 
>>  The theory being that if the cts:query described by $query is
>> already registered, then the registration is essentially a no-op
>> and you'll get back the same ID.  And doing this every time insures
>> that if the registered query has been evicted for some reason then
>> it's re-registered and all is well.
>> 
>>  It's a nice theory but seems to be based on the assumption that
>> creating a cts:query object is very cheap.  Unfortunately, I'm finding
>> that this is often not the case, especially when there are lots of
>> documents in the database.  I have a test case where performing Step 2
>> above on a moderately complicated query takes roughly 200ms every time.
>> Others take even longer and all seem to be proportional to database size.
>> But running Step 3 with cts:registered-query(<regid>) is very, very
>> fast (~0ms).  Re-creating the query for re-registering every time is
>> destroying the benefit of using a registered query.
>> 
>>  I can obviously save the registration ID obtained from calling
>> cts:register and then make a cts:registered-query each time, but then
>> I'm not protected from the query becoming unregistered.  And there is
>> no lightweight way to test if an ID is still registered.  The only way
>> I know to make this robust is to put a loop and try/catch around the
>> code that does the search.  But that requires passing along enough
>> context to re-construct and re-register the queries (there can be
>> dozens of them in this case).  This is obviously a lot harder than
>> building the complex query in one module and then passing it along
>> to the search code somewhere else.
>> 
>>  What's the generally accepted best usage pattern for registered
>> queries?  And is it my imagination or has the cost of running queries
>> been moving from query evaluation into query construction?
>> 
>>  Thanks.
>> 
>> ---
>> Ron Hitchens {mailto:[email protected]}   Ronsoft Technologies
>>    +44 7879 358 212 (voice)          http://www.ronsoft.com
>>    +1 707 924 3878 (fax)              Bit Twiddling At Its Finest
>> "No amount of belief establishes any fact." -Unknown
>> 
>> 
>> 
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to