Re: Jackrabbit query performance issues

Marcel Reutegger Mon, 12 Feb 2007 04:07:36 -0800

Hi Savas,

Savas Triantafillou wrote:

1.  As you may see in the following queries, I would like to load all nodes
of a certain type  using several forms


    The first one provides no information about the root path of the nodes,
nor any information about their name

            DEBUG - QueryImpl.execute(149) | executed in 0,26 s.
(//element(*, my:object))


     The second one provides information about the node's name and is
already slower than the first one, considering that it executed immediately
after the first query
     (i.e. cache seemed not to be working) and that it is slightly more
specific than the first one

              DEBUG - QueryImpl.execute(149) | executed in 0,36 s.
(//element(objectName, my:object))


This runs much faster on my jackrabbit instance.

I'm using 2000 test nodes of type nt:unstructured, each returning 21 nodes.

QueryImpl: executed in 0.14 s. (//element(node0, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node1, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node2, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node3, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node4, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node5, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node6, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node7, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node8, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node9, nt:unstructured))

The first query is considerably slower because the path cache in the queryhandler needs to be filled.

     The third query is similar to first one except the presence of the
ordering. Is the difference in time justified only by the presence of the
ordering ?

               DEBUG - QueryImpl.execute(149) | executed in 1,03 s.
(//element(*, my:object) order by @modified descending)

      The fourth query  is similar to the second one with the addition of
the orerding. Taking into account query execution times so far
      this time seems the most rational

               DEBUG - QueryImpl.execute(149) | executed in 0,58 s.
(//element(objectName, my:object) order by @modified descending)

       The fifth query is more specific concerning the path of the nodes.
It seems that cache seems to be working now

               DEBUG - QueryImpl.execute(149) | executed in 0,12 s.
(/jcr:root/my:system/my:objectRoot//element(*, my:object))

        The sixth query is even more specific, yet it is slower than the
above one!!!!

that's probably because it involves an additional AND operation. nodes with acertain name intersected with nodes of a certain type. whereas the latter onlysearches for nodes with a certain type.

                DEBUG - QueryImpl.execute(149) | executed in 0,25 s.
(/jcr:root/my:system/my:objectRoot//element(objectName, my:object))

        The last two queries differ in the presence of the ordering

                  DEBUG - QueryImpl.execute(149) | executed in 0,62 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
my:object) order by @modified descending)
                 DEBUG - QueryImpl.execute(149) | executed in 0,14 s.

(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName,

my:object) order by @modified descending)


Now, in order to have a more complete view, I have changed the order of the
queries in that more specific queries are executed first. Here are the
results

DEBUG - QueryImpl.execute(149) | executed in 0,55 s.
(/jcr:root/my:system/my:objectRoot//element(*, my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,44 s.
(/jcr:root/my:system/my:objectRoot//element(objectName, my:object))
DEBUG - QueryImpl.execute(149) | executed in 1,36 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,16 s.

(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName,

my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,03 s. (//element(*,
my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,30 s. (//element(objectName,
my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,28 s. (//element(*,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,11 s. (//element(objectName,
my:object) order by @modified descending)


My belief is that there is no specific rule for creating a query that will
guarantee a satisfactory time, not even the most obvious one, i.e. the more
specific the query is,
the faster it becomes.

This is not always the case. e.g. more specific may also mean in some cases morecomplex to execute.

2.  For each one of the 340 nodes I have created 40 versions and then rerun
the above queries. All times tripled which makes me think that a query of
type

    //element(*, my:nodeType)  will make Jackrabbit search through its
version nodes as well. If this is the case, why this is happening?

because the query also includes the jcr:system subtree. If you not interested innodes from the version store you need to exclude jcr:system subtree. E.g. haveyour content under a designated node instead of directly under the root node.Then you can search just in your content:

/jcr:root/my:content//element(*, my:type)

OR

if you don't want versions in your query results at all you can also disableindexing of versions:


- Remove or comment the tag /Repository/SearchIndex in your repository.xml

This change requires that you re-index all workspaces.

I would really appreciate your thoughts as we are using Jackrabbit as a
backend to a portal and migration from 1.1.1 to 1.2.1 changed portal
performance dramatically.

Can you please provide examples of queries that changed in performance betweenthe two versions?


regards
 marcel

Re: Jackrabbit query performance issues

Reply via email to