Marcel Reutegger wrote:
Christoph Kiehl wrote:
Christoph Kiehl wrote:
I was digging a bit into Jackrabbit today and found another place
where some caching did provide a substantial performance gain to
queries which check one attribute for more than one value (like
/foo/[EMAIL PROTECTED]:bar='john' or foo:bar='doe']). The BitSet in
calculateDocFilter() is right now created twice for the query above.
On large repositories this takes about 200ms per BitSet on my machine
for a particular field. Caching these BitSets per IndexReader and
field in a WeakHashMap with the IndexReader as a key gave me some
real improvements.
agreed, this should definitively be cached per index segment and is
doable with reasonable effort.
I've created a jira issue: http://issues.apache.org/jira/browse/JCR-791
Are you working on this issue? Or should I try to implement something?
- I was referring to calculateDocFilter() in
org.apache.jackrabbit.core.query.lucene.MatchAllScorer
- The achieved performance improvement varied between 30-60% depending
on the actual query
but that means your query is rather:
/foo/[EMAIL PROTECTED]:bar]
right?
Actually it's /foo/[EMAIL PROTECTED]:bar!='john']
@foo:bar='john' should be translated into a term query.
You are right. "="-comparisons translate into term queries whereas
"!="-comparisons gets translated into MatchAllQueries.
It seems like if I rewrite the following query from
/foo/[EMAIL PROTECTED]:bar!='john' and @foo:bar!='doe']
to
/foo/*[not(@foo:bar='john' or @foo:bar='doe')]
I get a better performance. Can you confirm this?
Cheers,
Christoph