[ 
https://issues.apache.org/jira/browse/JCR-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971692#action_12971692
 ] 

Thomas Draier commented on JCR-2835:
------------------------------------

Hi,

I just made some improvements in the DescendantNode constraint, using the same 
kind of subquery we do in XPATH (DescendantSelfAxisQuery)

First I had to slightly change the XPath test in order to make it more 
comparable with the one SQL-2, as the current query in DescendantSearchTest 
does not return any result :-)

So instead of : /testroot//*...@testcount=" + i + "]"  
I used :  /jcr:root/testroot//element(*,nt:base)[...@testcount=" + i + "]"
(added a the jcr:root , and an nt:base constraint to have the same constraint 
as in sql-2 - btw this could also be improved, as a constraint on nt:base does 
not make much sense and does not need to be expanded to all sub types)

Before patching, i had theses figures - similar to what serge got before

# DescendantSearchTest                   min     10%     50%     90%     max
2.2                                      411     416     430     450     690
# SQL2DescendantSearchTest               min     10%     50%     90%     max
2.2                                   203530  203530  203530  203530  203530

After  patching :

# DescendantSearchTest                   min     10%     50%     90%     max
2.3                                      420     429     448     479    1208
# SQL2DescendantSearchTest               min     10%     50%     90%     max
2.3                                      319     327     339     351     375

Which make the SQL2 queries even faster than the XPATH one. Basically, I use a 
DescendantSelfAxisQuery with subqueries when possible. Compared to Xpath, the 
context query is simpler ( the one that gets the ancestor node ), as it is 
based on nodeid instead of nested ChildAxisQuery queries - which can explain 
that sql-2 is slightly faster.

For example, an xpath query like : " 
/jcr:root/folder1/folder2//element(*,nt:type) "
Is translated to :

+DescendantSelfAxisQuery(
      +ChildAxisQuery(
            +ChildAxisQuery(
                 _:PARENT:, 
                 {}folder1), 
            {}folder2), 
      +_:PROPERTIES:1570322:primaryType[14877513:type, 
      1)

Where an equivalent " select * from [nt:type] as obj where 
ISDESCENDANTNODE(obj, '/folder1/folder2') " gives :
      
DescendantSelfAxisQuery(_:UUID:a4137e73-6a16-4148-9d61-2353230a15d0, 
      +_:PROPERTIES:1570322:primaryType[14877513:type, 
      1)

Note that it currently only works in the first level of constraint - an 
isDescendantNode constraint inside an OR / NOT boolean query won't use the 
subquery. I don't think it's a big issue for the OR - but it can be for the NOT 
.. 

The patch is attached ..

Regards


> Poor performance of ISDESCENDANTNODE on SQL 2 queries
> -----------------------------------------------------
>
>                 Key: JCR-2835
>                 URL: https://issues.apache.org/jira/browse/JCR-2835
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-core, query
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Serge Huber
>             Fix For: 2.3.0
>
>         Attachments: JCR-2835_PerformanceTests.patch, 
> JCR-2835_Poor_performance_on_ISDESCENDANTNODE_constraint_v1.patch
>
>
> Using the latest source code, I have noticed very bad performance on SQL-2 
> queries that use the ISDESCENDANTNODE constraint on a large sub-tree. For 
> example, the query : 
> select * from [jnt:news] as news where ISDESCENDANTNODE(news,'/root/site') 
> order by news.[date] desc 
> executes in 600ms 
> select * from [jnt:news] as news order by news.[date] desc
> executes in 4ms
> From looking at the problem in the Yourkit profiler, it seems that the 
> culprit is the constraint building, that uses recursive Lucene searches to 
> build the list of descendant node IDs : 
>     private Query getDescendantNodeQuery(
>             DescendantNode dn, JackrabbitIndexSearcher searcher)
>             throws RepositoryException, IOException {
>         BooleanQuery query = new BooleanQuery();
>         try {
>             LinkedList<NodeId> ids = new LinkedList<NodeId>();
>             NodeImpl ancestor = (NodeImpl) 
> session.getNode(dn.getAncestorPath());
>             ids.add(ancestor.getNodeId());
>             while (!ids.isEmpty()) {
>                 String id = ids.removeFirst().toString();
>                 Query q = new JackrabbitTermQuery(new Term(FieldNames.PARENT, 
> id));
>                 QueryHits hits = searcher.evaluate(q);
>                 ScoreNode sn = hits.nextScoreNode();
>                 if (sn != null) {
>                     query.add(q, SHOULD);
>                     do {
>                         ids.add(sn.getNodeId());
>                         sn = hits.nextScoreNode();
>                     } while (sn != null);
>                 }
>             }
>         } catch (PathNotFoundException e) {
>             query.add(new JackrabbitTermQuery(new Term(
>                     FieldNames.UUID, "invalid-node-id")), // never matches
>                     SHOULD);
>         }
>         return query;
>     }
> In the above example this generates over 2800 Lucene queries, which is the 
> culprit. I wonder if it wouldn't be faster to retrieve the IDs by using the 
> JCR to retrieve the list of child IDs ?
> This was probably also missed because I didn't seem to find any performance 
> tests on this constraint.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to