Optimize search performance

Christoph Kiehl Fri, 08 Jun 2007 03:49:07 -0700

Hi everyone,

I had a look at the search related code during the last days, because we needbetter performance for range queries on date fields as well as for sorting bydate fields. These are my thoughts so far:

1. Wouldn't it make sense to exclude the index for the "jcr:system" tree (whichis located at repository/index by default) if the query to execute doesn'tinclude items from the "jcr:system" tree.Take for example a query like "my:app//element(*, foo:bar)". This query onlysearches for nodes located under "my:app" which excludes nodes from "jcr:system"and therefore doesn't need to search in the "jcr:system" index.As the "jcr:system" might grow quite quickly if you create a lot versions itmight be worth to exclude it.I'm not sure though how hard it would be to find out if a query needs to includethe "jcr:system" index.

2. Lucene uses the FieldCaches to speed up sorting and range queries which isexactly what we are after. Those FieldCaches are per IndexReader.Jackrabbit uses an IndexSearcher which searches on a single IndexReader which ismost likely to be an instance of CachingMultiReader. So on every search whichbuilds up a FieldCache this FieldCache instance is associated with this instanceof a CachingMultiReader. On successive queries which operate on thisCachingMultiReader you will get a tremendous speedup for queries which can reusethose associated FieldCache instances.The problem is that Jackrabbit creates a new CachingMultiReader _everytime_ oneof the underlying indexes are modified. This means if you just change _one_ itemin the repository you will need to rebuild all those FieldCaches because theexisting FieldCaches are associated with the old instance of CachingMultiReader.This does not only lead to slow search response times for queries which containsrange queries or are sorted by a field but also leads to massive memoryconsumption (depending on the size of your indexes) because there might bemultiple instances of CachingMultiReaders in use if you have a scenario where alot of queries and item modifications are executed concurrently.As far as I understand the solution is to use a MultiSearcher which usesmultiple IndexReaders. Since due to the merging strategy most of the indexes arestable this means the FieldCaches can be used for a much longer time.

I just tried to quickly modify SearchIndex to use a MultiSearcher with multipleIndexReaders wrapped by IndexSearchers but wasn't successful because somewherein DescendantSelfAxisWeight the index readers are required to implementHierarchyResolver which ReadOnlyIndexReader doesn't.

So I thought I might ask you for some insight what you think about those twoideas before spending to much time walking down the wrong way ;)


Cheers,
Christoph

Optimize search performance

Reply via email to