Hi Savas,
Savas Triantafillou wrote:
1. As you may see in the following queries, I would like to load all nodes
of a certain type using several forms
The first one provides no information about the root path of the nodes,
nor any information about their name
DEBUG - QueryImpl.execute(149) | executed in 0,26 s.
(//element(*, my:object))
The second one provides information about the node's name and is
already slower than the first one, considering that it executed immediately
after the first query
(i.e. cache seemed not to be working) and that it is slightly more
specific than the first one
DEBUG - QueryImpl.execute(149) | executed in 0,36 s.
(//element(objectName, my:object))
This runs much faster on my jackrabbit instance.
I'm using 2000 test nodes of type nt:unstructured, each returning 21 nodes.
QueryImpl: executed in 0.14 s. (//element(node0, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node1, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node2, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node3, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node4, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node5, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node6, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node7, nt:unstructured))
QueryImpl: executed in 0.02 s. (//element(node8, nt:unstructured))
QueryImpl: executed in 0.00 s. (//element(node9, nt:unstructured))
The first query is considerably slower because the path cache in the query
handler needs to be filled.
The third query is similar to first one except the presence of the
ordering. Is the difference in time justified only by the presence of the
ordering ?
DEBUG - QueryImpl.execute(149) | executed in 1,03 s.
(//element(*, my:object) order by @modified descending)
The fourth query is similar to the second one with the addition of
the orerding. Taking into account query execution times so far
this time seems the most rational
DEBUG - QueryImpl.execute(149) | executed in 0,58 s.
(//element(objectName, my:object) order by @modified descending)
The fifth query is more specific concerning the path of the nodes.
It seems that cache seems to be working now
DEBUG - QueryImpl.execute(149) | executed in 0,12 s.
(/jcr:root/my:system/my:objectRoot//element(*, my:object))
The sixth query is even more specific, yet it is slower than the
above one!!!!
that's probably because it involves an additional AND operation. nodes with a
certain name intersected with nodes of a certain type. whereas the latter only
searches for nodes with a certain type.
DEBUG - QueryImpl.execute(149) | executed in 0,25 s.
(/jcr:root/my:system/my:objectRoot//element(objectName, my:object))
The last two queries differ in the presence of the ordering
DEBUG - QueryImpl.execute(149) | executed in 0,62 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,14 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName,
my:object) order by @modified descending)
Now, in order to have a more complete view, I have changed the order of the
queries in that more specific queries are executed first. Here are the
results
DEBUG - QueryImpl.execute(149) | executed in 0,55 s.
(/jcr:root/my:system/my:objectRoot//element(*, my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,44 s.
(/jcr:root/my:system/my:objectRoot//element(objectName, my:object))
DEBUG - QueryImpl.execute(149) | executed in 1,36 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(*,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,16 s.
(/jcr:root/my:system/my:objectRoot/objectNameTypeFolder//element(objectName,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,03 s. (//element(*,
my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,30 s. (//element(objectName,
my:object))
DEBUG - QueryImpl.execute(149) | executed in 0,28 s. (//element(*,
my:object) order by @modified descending)
DEBUG - QueryImpl.execute(149) | executed in 0,11 s. (//element(objectName,
my:object) order by @modified descending)
My belief is that there is no specific rule for creating a query that will
guarantee a satisfactory time, not even the most obvious one, i.e. the more
specific the query is,
the faster it becomes.
This is not always the case. e.g. more specific may also mean in some cases more
complex to execute.
2. For each one of the 340 nodes I have created 40 versions and then rerun
the above queries. All times tripled which makes me think that a query of
type
//element(*, my:nodeType) will make Jackrabbit search through its
version nodes as well. If this is the case, why this is happening?
because the query also includes the jcr:system subtree. If you not interested in
nodes from the version store you need to exclude jcr:system subtree. E.g. have
your content under a designated node instead of directly under the root node.
Then you can search just in your content:
/jcr:root/my:content//element(*, my:type)
OR
if you don't want versions in your query results at all you can also disable
indexing of versions:
- Remove or comment the tag /Repository/SearchIndex in your repository.xml
This change requires that you re-index all workspaces.
I would really appreciate your thoughts as we are using Jackrabbit as a
backend to a portal and migration from 1.1.1 to 1.2.1 changed portal
performance dramatically.
Can you please provide examples of queries that changed in performance between
the two versions?
regards
marcel