Pete Lewis wrote:
Hi Christoph
The directory caching is applied *across* class instances (the directory
is instanced once) - this cache exists singularily and is updated if the
FSDirectory is called against a different index.
Yes. That's how we guarantee that every process has at most one FSDirectory
instance for every index.
Multiple indexes will *always* cause directory caching upon calls to
FSDirectory - our searches are made sequentially against all libraries
(or a selection of libraries) and this sequential call to FSDirectory
causes the cache to be updated - its very, very rare that the cache will
remain the same between two calls to get FSDirectories. This caching
*is* synchronized using the commit.lock (see the code) and two processes
will attain two different caches (completely
separate) *but* are tied together by the commit lock. This is what
causes the spin.
As I said, synchronization on directory instances is the in-process
synchronization mechanism. The commit.lock mechanism is the inter-process
synchronization mechanism. If you use one process (independent JVM) for
more than one search, you will get hits in the directories cache.
The in-process mechanism is the first synchronization made when
opening an IndexReader. Obviously you need a directory instance to
synchronize on, and this instance has to be unique for your process.
In order to get the directory instance we synchronize on the static
directory-cache and this may be a bottleneck since all opening
threads independent of the index they are trying to open have to
synchronize on the static cache. However, accessing the hashtable should
be fast, shouln't it? In order to get the directory instance you do not
need a commit.lock. That's what I meant by:
FSDirectory.getDirectory has nothing to do with a commit.lock!
Err, wrong. The directory.makeLock(IndexWriter.COMMIT_LOCK_NAME) call
from within the IndexReader.open routine ties the commit.lock to the
FSDirectory by synchronising the code around a *static* instance of the
directory object (see the code!!).
After synchronizing all threads of one process that try to open the same
index (on the directory), the inter-process mechanism with
the commit.lock is applied in order to synchronize with other processes
that might try to open the same index.
So there are 3 places where synchronization is done. Could you please again
tell me what in your opinion is the most critical and do you have any ideas
how we could improve synchroniztion?
One of your ideas was to turn off the commit.lock mechanism. However, I think
we cannot give up inter-process synchronization....
Furthermore, what are the implications of these synchronization problems for
your application. Do they just make application start-up slow, or do they
slow down every search? This is of course about reusing processes and searcher
instances for more than one query/search. Everything else is simply using
Lucene in the wrong way.
regards,
Christoph
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]