thomasg wrote:
Yes I ran another query straight after and the execution time drops from
37000ms to 16ms! So the delay when the query is first ran is due to text
extraction and indexing having to complete?
This leads to another question. Page 59 of Lucene In Action states:
"Any number of read-only operations may be executed while an index is being
modified. For example, users can search an index while it's being optimized
or while new documents are being added to the index, updated, or deleted
from the index."
It seems that this behaviour of Lucene is not the situation when Jackrabbit
integrates with Lucene. Is there a reason for the index not being readable
while the extraction / indexing takes place?
what you see in your example when you add a large document is also about
consistency. jackrabbit guarantees that once you add a node it is
immediately searchable. even though there is some buffering going on
behind the scenes. executing a query earlier on an incomplete index
could lead to wrong results.
in general jackrabbit allows you to make changes to the index while
queries are running. the other way round is not always true. if there is
an open reader on the index while changes are made to the index then you
can also execute a query on it. but it may happen that no reader is open
at all, then the 'reading' thread on the index is blocked until the
writing is completed.
you can try this out by executing queries in jackrabbit with one session
and then add the large document with another session. the first thread
should keep executing the queries even though the document is added. at
least that's the intended behaviour ;)
regards
marcel