[
https://issues.apache.org/jira/browse/JCR-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481569
]
Marcel Reutegger commented on JCR-791:
--------------------------------------
Here's what I've done so far:
- Introduced a MultiIndexReader interface that allows to access the sub index
readers.
- CachingMultiReader and SearchIndex.CombinedIndexReader now implement
MultiIndexReader
- Created a MultiScorer which spans multiple sub scorers and combines. The
MultiScorer exposes the sub scorers as if there is just a single scorer.
- Changed MatchAllWeight to create individual scorers for each sub IndexReader
contained in a MultiIndexReader and finally combines them into a MultiScorer.
- Introduced a BitSet cache in MatchAllScorer
I then conducted the following tests:
Setup:
- 50'000 nodes
- resultFetchSize: 50
- respectDocumentOrder: false
100 queries: //element(*, nt:unstructured)[EMAIL PROTECTED]
(only size of NodeIterator is read, no node access)
Results:
1) with jackrabbit 1.2.3:
82078 ms
2) with MatchAllScorer per index segment
combined with MultiScorer without caching:
10297 ms
3) with MatchAllScorer per index segment
combined with MultiScorer with caching:
6156 ms
My conclusion is that the the lucene MultiTermDocs implementation adds
significant cost when a single MatchAllScorer is used in test scenario 1). And
it actually makes sense. If a single MatchAllScorer is used, lucene has to
merge sort the @foo terms of several index segments, while in the test
scenarios 2) and 3) no merge sort is needed for the @foo terms.
With the changes the query performance even seems good enough even without
caching.
I'm tempted to only check the changes without caching because the additional
performance improvement with caching does not seem to warrant the memory
consumption of the cache: 2) decreases the query time compared to the current
implementation by 87% while 3) decreases query time by 92%.
> Cache BitSet per IndexReader in MatchAllScorer
> ----------------------------------------------
>
> Key: JCR-791
> URL: https://issues.apache.org/jira/browse/JCR-791
> Project: Jackrabbit
> Issue Type: Improvement
> Components: query
> Reporter: Marcel Reutegger
> Priority: Minor
>
> The BitSets created in MatchAllScorer should be cached per IndexReader. This
> enhancement should also take care that the supplied IndexReader may in fact
> be a CombinedIndexReader or a CachingMultiReader with multiple contained
> IndexReaders. To achieve a good cache efficiency the BitSets must be cached
> per contained IndexReader and combined later.
> See also thread on dev list:
> http://thread.gmane.org/gmane.comp.apache.jackrabbit.devel/10976
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.