Hi Christoph If we stand back a second and ask why we have commit locks when searching?
The answer comes down to handling the FSDirectory - where the methods used are not j2ee compliant. We could open another can of worms and say why does the indexreader delete - but I won't go into that argument again here..... The bottom line is that we need the ability to search without waiting on a commit lock. The FSDirectory is where the problems lie. We could hack the code to make it work for our particular application; however what I've been trying to get across is the need to have a method that will give users the capability to just search (not delete) without waiting upon the commit lock, that will be j2ee compliant, and that will be appropriate clustered implementations - and that this should be a candidate for Lucene 1.9 / 2.0. You say that it shouldn't take long to wait. A 1 sec spin lock per index per process is an eternity when trying to scale for potentially thousands of users. Cheers Pete Lewis ----- Original Message ----- From: "Christoph Goller" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Tuesday, September 14, 2004 8:57 AM Subject: Re: Lock handling and Lucene 1.9 / 2.0 > Pete Lewis wrote: > > Hi Christoph > > > > The directory caching is applied *across* class instances (the directory > > is instanced once) - this cache exists singularily and is updated if the > > FSDirectory is called against a different index. > > Yes. That's how we guarantee that every process has at most one FSDirectory > instance for every index. > > > > > Multiple indexes will *always* cause directory caching upon calls to > > FSDirectory - our searches are made sequentially against all libraries > > (or a selection of libraries) and this sequential call to FSDirectory > > causes the cache to be updated - its very, very rare that the cache will > > remain the same between two calls to get FSDirectories. This caching > > *is* synchronized using the commit.lock (see the code) and two processes > > will attain two different caches (completely > > separate) *but* are tied together by the commit lock. This is what > > causes the spin. > > As I said, synchronization on directory instances is the in-process > synchronization mechanism. The commit.lock mechanism is the inter-process > synchronization mechanism. If you use one process (independent JVM) for > more than one search, you will get hits in the directories cache. > > The in-process mechanism is the first synchronization made when > opening an IndexReader. Obviously you need a directory instance to > synchronize on, and this instance has to be unique for your process. > In order to get the directory instance we synchronize on the static > directory-cache and this may be a bottleneck since all opening > threads independent of the index they are trying to open have to > synchronize on the static cache. However, accessing the hashtable should > be fast, shouln't it? In order to get the directory instance you do not > need a commit.lock. That's what I meant by: > > >>FSDirectory.getDirectory has nothing to do with a commit.lock! > > > > > > Err, wrong. The directory.makeLock(IndexWriter.COMMIT_LOCK_NAME) call > > from within the IndexReader.open routine ties the commit.lock to the > > FSDirectory by synchronising the code around a *static* instance of the > > directory object (see the code!!). > > After synchronizing all threads of one process that try to open the same > index (on the directory), the inter-process mechanism with > the commit.lock is applied in order to synchronize with other processes > that might try to open the same index. > > So there are 3 places where synchronization is done. Could you please again > tell me what in your opinion is the most critical and do you have any ideas > how we could improve synchroniztion? > > One of your ideas was to turn off the commit.lock mechanism. However, I think > we cannot give up inter-process synchronization.... > > Furthermore, what are the implications of these synchronization problems for > your application. Do they just make application start-up slow, or do they > slow down every search? This is of course about reusing processes and searcher > instances for more than one query/search. Everything else is simply using > Lucene in the wrong way. > > regards, > Christoph > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]