Hi everyone,

I had a look at the search related code during the last days, because we need better performance for range queries on date fields as well as for sorting by date fields. These are my thoughts so far:

1. Wouldn't it make sense to exclude the index for the "jcr:system" tree (which is located at repository/index by default) if the query to execute doesn't include items from the "jcr:system" tree. Take for example a query like "my:app//element(*, foo:bar)". This query only searches for nodes located under "my:app" which excludes nodes from "jcr:system" and therefore doesn't need to search in the "jcr:system" index. As the "jcr:system" might grow quite quickly if you create a lot versions it might be worth to exclude it. I'm not sure though how hard it would be to find out if a query needs to include the "jcr:system" index.

2. Lucene uses the FieldCaches to speed up sorting and range queries which is exactly what we are after. Those FieldCaches are per IndexReader. Jackrabbit uses an IndexSearcher which searches on a single IndexReader which is most likely to be an instance of CachingMultiReader. So on every search which builds up a FieldCache this FieldCache instance is associated with this instance of a CachingMultiReader. On successive queries which operate on this CachingMultiReader you will get a tremendous speedup for queries which can reuse those associated FieldCache instances. The problem is that Jackrabbit creates a new CachingMultiReader _everytime_ one of the underlying indexes are modified. This means if you just change _one_ item in the repository you will need to rebuild all those FieldCaches because the existing FieldCaches are associated with the old instance of CachingMultiReader. This does not only lead to slow search response times for queries which contains range queries or are sorted by a field but also leads to massive memory consumption (depending on the size of your indexes) because there might be multiple instances of CachingMultiReaders in use if you have a scenario where a lot of queries and item modifications are executed concurrently. As far as I understand the solution is to use a MultiSearcher which uses multiple IndexReaders. Since due to the merging strategy most of the indexes are stable this means the FieldCaches can be used for a much longer time.

I just tried to quickly modify SearchIndex to use a MultiSearcher with multiple IndexReaders wrapped by IndexSearchers but wasn't successful because somewhere in DescendantSelfAxisWeight the index readers are required to implement HierarchyResolver which ReadOnlyIndexReader doesn't.

So I thought I might ask you for some insight what you think about those two ideas before spending to much time walking down the wrong way ;)

Cheers,
Christoph

Reply via email to