Hello,

Jukka pointed out that generic discussions should be on dev and not in JIRA 
issues, so I'll repeat here my last comment from JCR-1213. Also related is 
JCR-1196 [Queries for DescendantSelfAxisWeight/ChildAxisQuery are currently 
very heavy and become slow pretty quickly].

Background: Marcel, Chirstoph and I have been working to fix the cache of 
DescendantSelfAxisWeight/ChildAxisQuery regarding hierarchy resolver. This 
cache seems to be fixed now (not yet in trunk), but during tests, I realized 
some things that seem unlogical to me (I'll now copy-paste from JCR-1213) :

           ---------------------000---------------------
During the tests (with the fixed hierarchy cache), having 1.200.000 nodes in 
the repository, I realized we are still doing something 'irrational'. It won't 
be easy to implement I think, because it also depends/involves wether people 
have implemented an AccessManager, but if I have the following test:

Query q = qm.createQuery("stuff//[EMAIL PROTECTED]", Query.XPATH);
if (q instanceof QueryImpl) {
    // limit the result set
    ((QueryImpl) q).setLimit(1);
}

Since my "stuff//[EMAIL PROTECTED]" gives me 1.200.000, it makes perfect sense 
to users I think, that even with our patches and a working cache, that 
retaining them all would be slow. But if I set the limit to 1 or 10, I would 
expect to have performance (certainly when you have not implemented any 
AccessManager).

But, if I set limit to 1, why would we have to check all 1.200.000 parents 
wether the path is correct?

If I get a sorted hits by lucene (only on the "//[EMAIL PROTECTED]" part 
(perhaps with an order by as well), so without the initial path), I would want 
to start with the first one, and check the parent, then the second, etc, untill 
I have a hit that is correct according its path. If I have a limit of 10, we 
would need to get 10 successes. Obviously, in the worst case scenario, we would 
still have to check every hit for its parents, but this would be rather 
exceptional i think.

Ofcourse, when people have a custom AccessManager impl, you only know after the 
access manager wether the hit was a real hit. But when having

Query q = qm.createQuery("stuff//[EMAIL PROTECTED]", Query.XPATH);
if (q instanceof QueryImpl) {
    // limit the result set
    ((QueryImpl) q).setLimit(1);
}

and I have > 1.000.000 hits, and I have to wait, even in the cached version, a 
few seconds, but changing "stuff//[EMAIL PROTECTED]" into "//[EMAIL PROTECTED]" 
reduces it to a couple of ms, that does not make sense.

I think we should consider wether we could do the DescendantSelfAxisQuery or 
ChildAxisQuery as some sort of lazy filter. In the end, when users want to also 
have the total hits for "stuff//[EMAIL PROTECTED]", we obviously are still 
facing a slow query. WDOT? This though obviously might belong to a new jira 
issue, or to the existing one about the DescendantSelfAxisQuery and 
ChildAxisQuery performance. 

           ---------------------000---------------------

WDOT? 

Regards Ard

Reply via email to