Hi Christoph The directory caching is applied *across* class instances (the directory is instanced once) - this cache exists singularily and is updated if the FSDirectory is called against a different index.
Multiple indexes will *always* cause directory caching upon calls to FSDirectory - our searches are made sequentially against all libraries (or a selection of libraries) and this sequential call to FSDirectory causes the cache to be updated - its very, very rare that the cache will remain the same between two calls to get FSDirectories. This caching *is* synchronized using the commit.lock (see the code) and two processes (independent JVM's) will attain two different caches (completely separate) *but* are tied together by the commit lock. This is what causes the spin. > FSDirectory.getDirectory has nothing to do with a commit.lock! Err, wrong. The directory.makeLock(IndexWriter.COMMIT_LOCK_NAME) call from within the IndexReader.open routine ties the commit.lock to the FSDirectory by synchronising the code around a *static* instance of the directory object (see the code!!). Cheers Pete Lewis ----- Original Message ----- From: "Christoph Goller" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Monday, September 13, 2004 11:34 AM Subject: Re: Lock handling and Lucene 1.9 / 2.0 > Pete Lewis wrote: > > Hi Christoph > > > Long answer - theres a heap of horrible, horrible code in the FSDirectory that tries to be clever and I think its not quite working correctly. > > > > Two types of lock - write.lock and commit.lock. The write.lock is used exclusively for synchronising the indexing of documents and has *no* impact on searching whatsoever. > > > > Commit.lock is another little story. Commit.lock is used for two things - stopping indexing processes from overwriting segments that another one is currently using, and stopping IndexReaders from overwriting each other when they delete entries (dcon't even start asking my why a bloody IndexReader can delete documents). > > Commit.lock is used to synchronize comittment of changes to an index > with the process of opening an IndexReader. These changes my come from > an IndexWriter or an IndexReader. There are good reasons for having the > delete functionality in IndexReader (see developer mailing list around > July 16). Write.lock is used to gurantee that there always is only one > writer. > > > > > *However*, theres another naughty little usage that isn't listed in any of the documentation, and here it is.... > > > > Doug Cutting wrote FSDirectory in such a way that it caches a directory. Hence, if FSDirectory is called more than once with the same directory, the FSDirectory class uses a static Hashtable to return the current values. However, if FSDirectory is called with a *different* directory, it engages a commit.lock while it updates the values. It *also* makes that Hashtable (sychronised). > > FSDirectory.getDirectory has nothing to do with a commit.lock! > Lucene currently uses 2 locking mechanisms, the interprocess > mechanism with the commit.lock file and an intraprocess mechanism > based on synchronization on directory instances. The 2nd mechanism > needs unique directory instances and this is achieved by caching > directory instances in FSDirectory. > > > > > Creating an IndexSearcher creates (within itself) an IndexReader to read the index. The first thing the IndexReader does is grab an FSDirectory for the index directory - if you are using LUCENE with a single index, theres is never a problem - it is read once, then cached. > > > > Our search process works by searching across all the libraries selected sequentially, building a results list and then culling the results it doesn't need. To search it loops through each library and creates an IndexSearcher to get at the data. > > > > Starting to see the issue yet? Because each library is in a different directory, the internal call to the IndexReader which then gets an FSDirectory causes the FSDirectory to update its singular cache. Which forces a commit.lock to appear. > > > > Doug Cuttings little bit of 'neat' code for caching singularily the data within an FSDirectory is causing us headaches immense. The code is horrible: > > > > /** Returns an IndexReader reading the index in the given Directory. */ > > public static IndexReader open(final Directory directory) throws IOException{ > > synchronized (directory) { // in- & inter-process sync > > return (IndexReader)new Lock.With( > > directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), > > IndexWriter.COMMIT_LOCK_TIMEOUT) { > > public Object doBody() throws IOException { > > SegmentInfos infos = new SegmentInfos(); > > infos.read(directory); > > if (infos.size() == 1) { // index is optimized > > return new SegmentReader(infos, infos.info(0), true); > > } else { > > SegmentReader[] readers = new SegmentReader[infos.size()]; > > for (int i = 0; i < infos.size(); i++) > > readers[i] = new SegmentReader(infos, infos.info(i), i==infos.size()-1); > > return new SegmentsReader(infos, directory, readers); > > } > > } > > }.run(); > > } > > } > > > > Where directory is passed in from the constructor to IndexReader thus: > > > > return open( FSDirectory.getDirectory( path, false ) ); > > All threads that open an IndexReader and that don't get a directory instance > directly have to compete for FSDirectory.getDirectory synchronization > independent of the index you are trying to open. So you are right. This > is a bottleneck. > > After that, threads opening an IndexReader only compete with each other > if they try to read the same index. This is handled by the two above > mentioned locking mechanisms. > > Here are two ideas that could help: > The bottleneck only occurs if you always start a new process for every search, > doesn't it? If you make a second search within the same process, > the directory instances will already be cached and the bottleneck won't be a > problem? Furthermore, you do not have to always open new searchers for every > search. Can't you use your Searcher instances for multiple searches. > > A question for Lucene 1.9/2.0 is, whether we really need intraprocess and > interprocess synchonization. Maybe these two mechanisms exist for purely > historical reasons and the interprocess mechanism alone would be enough? > > Christoph > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]