[
https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971692#action_12971692
]
Thomas Draier commented on JCR-2835:
------------------------------------
Hi,
I just made some improvements in the DescendantNode constraint, using the same
kind of subquery we do in XPATH (DescendantSelfAxisQuery)
First I had to slightly change the XPath test in order to make it more
comparable with the one SQL-2, as the current query in DescendantSearchTest
does not return any result :-)
So instead of : /testroot//*...@testcount=" + i + "]"
I used : /jcr:root/testroot//element(*,nt:base)[...@testcount=" + i + "]"
(added a the jcr:root , and an nt:base constraint to have the same constraint
as in sql-2 - btw this could also be improved, as a constraint on nt:base does
not make much sense and does not need to be expanded to all sub types)
Before patching, i had theses figures - similar to what serge got before
# DescendantSearchTest min 10% 50% 90% max
2.2 411 416 430 450 690
# SQL2DescendantSearchTest min 10% 50% 90% max
2.2 203530 203530 203530 203530 203530
After patching :
# DescendantSearchTest min 10% 50% 90% max
2.3 420 429 448 479 1208
# SQL2DescendantSearchTest min 10% 50% 90% max
2.3 319 327 339 351 375
Which make the SQL2 queries even faster than the XPATH one. Basically, I use a
DescendantSelfAxisQuery with subqueries when possible. Compared to Xpath, the
context query is simpler ( the one that gets the ancestor node ), as it is
based on nodeid instead of nested ChildAxisQuery queries - which can explain
that sql-2 is slightly faster.
For example, an xpath query like : "
/jcr:root/folder1/folder2//element(*,nt:type) "
Is translated to :
+DescendantSelfAxisQuery(
+ChildAxisQuery(
+ChildAxisQuery(
_:PARENT:,
{}folder1),
{}folder2),
+_:PROPERTIES:1570322:primaryType[14877513:type,
1)
Where an equivalent " select * from [nt:type] as obj where
ISDESCENDANTNODE(obj, '/folder1/folder2') " gives :
DescendantSelfAxisQuery(_:UUID:a4137e73-6a16-4148-9d61-2353230a15d0,
+_:PROPERTIES:1570322:primaryType[14877513:type,
1)
Note that it currently only works in the first level of constraint - an
isDescendantNode constraint inside an OR / NOT boolean query won't use the
subquery. I don't think it's a big issue for the OR - but it can be for the NOT
..
The patch is attached ..
Regards
> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
> Key: JCR-2835
> URL: https://issues.apache.org/jira/browse/JCR-2835
> Project: Jackrabbit Content Repository
> Issue Type: Improvement
> Components: jackrabbit-core, query
> Affects Versions: 2.2.0, 2.2.1, 2.3.0
> Reporter: Serge Huber
> Fix For: 2.3.0
>
> Attachments: JCR-2835_PerformanceTests.patch,
> JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2
> queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For
> example, the query :
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site')
> order by news.[date] desc
> executes in 600ms
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the
> culprit is the constraint building, that uses recursive Lucene searches to
> build the list of descendant node IDs :
> private Query getDescendantNodeQuery(
> DescendantNode dn, JackrabbitIndexSearcher searcher)
> throws RepositoryException, IOException {
> BooleanQuery query = new BooleanQuery();
> try {
> LinkedList<NodeId> ids = new LinkedList<NodeId>();
> NodeImpl ancestor = (NodeImpl)
> session.getNode(dn.getAncestorPath());
> ids.add(ancestor.getNodeId());
> while (!ids.isEmpty()) {
> String id = ids.removeFirst().toString();
> Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT,
> id));
> QueryHits hits = searcher.evaluate(q);
> ScoreNode sn = hits.nextScoreNode();
> if (sn != null) {
> query.add(q, SHOULD);
> do {
> ids.add(sn.getNodeId());
> sn = hits.nextScoreNode();
> } while (sn != null);
> }
> }
> } catch (PathNotFoundException e) {
> query.add(new JackrabbitTermQuery(new Term(
> FieldNames.UUID, "invalid-node-id")), // never matches
> SHOULD);
> }
> return query;
> }
> In the above example this generates over 2800 Lucene queries, which is the
> culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the
> JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance
> tests on this constraint.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.