That sounds like "entitlements". A registered query is a pretty good way to 
represent entitlements. But I wouldn't register all the queries up front. That 
seems like wasted effort if some of the users never log in. If it impacts 
update performance then it's wasted effort for every update between startup and 
the user's first query. Finally the query code still has to be prepared to 
re-register the query if a search throws XDMP-UNREGISTERED, and that can happen 
any time. So I think it's better to be lazy about registered queries.

How have you tested the idea that registered queries are causing the update 
problems? It would be annoying to put a lot of work into entitlements code 
without fixing that problem. For example you could unregister all those 
queries. Or restarting all the database's forests would have the same effect. 
Then you could test ingestion again, without any registered queries in play. 
Given the parallel thread about directory fragments I'd also set 
file-log-level=Debug and look out for debug-level XDMP-DEADLOCK messages.

Getting back to use of registered queries, my usual approach is to register 
each query as it's used, something like:

    cts:registered-query(
      cts:register(query-entitlements($userid)),
        'unfiltered')

This re-registers the query every time, which guards against XDMP-UNREGISTERED. 
It is cheap for queries that are already registered.

By registering queries lazily, queries belonging to idle users will tend to 
fall out of the system. If there's a front-end server that can tell your 
MarkLogic app-server when users log in or out, you could even preregister 
queries and deregister queries. However you'd still have to guard against 
XDMP-UNREGISTERED, as with the above code.

And as you outlined I'd try to simplify the queries if possible. The complexity 
may be more of a problem than the quantity. One important tool for this is the 
"shotgun OR": most cts:query constructors support a sequence of values. It 
might also be worth looking into composable groups of registered queries, so 
that N users can share a smaller number of registered queries.

-- Mike

On 30 Mar 2014, at 02:13 , David Ennis <[email protected]> wrote:

> Hi Mike,
> 
> Thanks for the reply.  We've just started a consulting job at this client and 
> are unravelling the various levels of the programming.  We suspected that 
> registering all of those queries and then hitting the system with 100,000 
> inserts was bound to bog down the system. 
> 
> Their system had ~50 million items with ~2,000 users having access to certain 
> groups of those 50 million based on a subscription file per user.  These 
> subscription files have sometimes 20,000 entries.  It appears that early on, 
> they got stuck on how to approach this (we know that they are generating some 
> 'subscription queries' that have thousands of nested cts:and queries, for 
> instance). The solution at the time was to simply register the monsterous 
> queries.  This just appears to have compounded the issue by introducing 
> another items causing a bottlneck (the internal maintenance of the queries).  
> So when tuning the original queries, any gain in performance was likely 
> masked by the newer delay (registered queries).
> 
> Our approach now is likely to abandon their registered queries and a 
> combination of (1) optimize the original queries (looks like terms can be 
> boiled down to hundreds instead of thousands) and possibly also generate our 
> own 'smart caches' per user that could be updated in various less-intensive 
> manner.
> 
> Regards,
> David Ennis
> 
> 
> On 29 March 2014 14:58, Michael Blakeley <[email protected]> wrote:
> Registered queries are smart list-cache entries.
> 
> You've already deduced that that implies extra work when updates happen, 
> either immediately or when each registered query is next used. With a lot of 
> registered queries it's probably more efficient to do that work with each 
> update, but I haven't noticed that behavior myself.
> 
> Why pre-register so many queries? As a rule of thumb it isn't worth 
> registering a query unless it will be used it 2-3 times. Maybe that should be 
> 2-3 times before the next update, too.
> 
> -- Mike
> 
> On 28 Mar 2014, at 22:48 , David Ennis <[email protected]> wrote:
> 
> > HI.
> >
> > We have a client that has about 4,000 registered queries.  These are rather 
> > 'large' (taking about 30 minutes to register all of them.
> >
> > One of the tests yesterday seems to confirm that ingestion of new content 
> > is 1/2 as slow when the queries are registered. Unregistering the queries 
> > again increases throughput of the ingestion.
> >
> > It should be noted that no queries are being run - they are just sitting 
> > registered.
> >
> > Can someone explain the inner workings of registered queries?  It seems to 
> > me that there is some level of maintenance of caches related to these 
> > registered queries as new documents are ingested - regardless of the query 
> > being used.
> >
> > Intuition says that this is likely the case, but I would like to be sure 
> > and cannot find enough information to truly support this theory.
> >
> > So, does registered queries do something that could be causing quite some 
> > overhead to internally maintain them while ingestion is happening?
> >
> > Kind Regards,
> > David
> >
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to