Hello,
Jukka pointed out that generic discussions should be on dev and not in JIRA
issues, so I'll repeat here my last comment from JCR-1213. Also related is
JCR-1196 [Queries for DescendantSelfAxisWeight/ChildAxisQuery are currently
very heavy and become slow pretty quickly].
Background: Marcel, Chirstoph and I have been working to fix the cache of
DescendantSelfAxisWeight/ChildAxisQuery regarding hierarchy resolver. This
cache seems to be fixed now (not yet in trunk), but during tests, I realized
some things that seem unlogical to me (I'll now copy-paste from JCR-1213) :
---------------------000---------------------
During the tests (with the fixed hierarchy cache), having 1.200.000 nodes in
the repository, I realized we are still doing something 'irrational'. It won't
be easy to implement I think, because it also depends/involves wether people
have implemented an AccessManager, but if I have the following test:
Query q = qm.createQuery("stuff//[EMAIL PROTECTED]", Query.XPATH);
if (q instanceof QueryImpl) {
// limit the result set
((QueryImpl) q).setLimit(1);
}
Since my "stuff//[EMAIL PROTECTED]" gives me 1.200.000, it makes perfect sense
to users I think, that even with our patches and a working cache, that
retaining them all would be slow. But if I set the limit to 1 or 10, I would
expect to have performance (certainly when you have not implemented any
AccessManager).
But, if I set limit to 1, why would we have to check all 1.200.000 parents
wether the path is correct?
If I get a sorted hits by lucene (only on the "//[EMAIL PROTECTED]" part
(perhaps with an order by as well), so without the initial path), I would want
to start with the first one, and check the parent, then the second, etc, untill
I have a hit that is correct according its path. If I have a limit of 10, we
would need to get 10 successes. Obviously, in the worst case scenario, we would
still have to check every hit for its parents, but this would be rather
exceptional i think.
Ofcourse, when people have a custom AccessManager impl, you only know after the
access manager wether the hit was a real hit. But when having
Query q = qm.createQuery("stuff//[EMAIL PROTECTED]", Query.XPATH);
if (q instanceof QueryImpl) {
// limit the result set
((QueryImpl) q).setLimit(1);
}
and I have > 1.000.000 hits, and I have to wait, even in the cached version, a
few seconds, but changing "stuff//[EMAIL PROTECTED]" into "//[EMAIL PROTECTED]"
reduces it to a couple of ms, that does not make sense.
I think we should consider wether we could do the DescendantSelfAxisQuery or
ChildAxisQuery as some sort of lazy filter. In the end, when users want to also
have the total hits for "stuff//[EMAIL PROTECTED]", we obviously are still
facing a slow query. WDOT? This though obviously might belong to a new jira
issue, or to the existing one about the DescendantSelfAxisQuery and
ChildAxisQuery performance.
---------------------000---------------------
WDOT?
Regards Ard