Hi Christoph We are running a cluster of 4 multi-processor Sun servers with Bea Weblogic. We are using Lucene for the search component and have multiple indexes on a SAN, where all indexes are accessible from all of the servers in the cluster.
During performance testing we found that Lucene seemed to be taking a lot of resources. When the system was "stressed" we did a number of thread dumps; the system appears to have most threads that are doing work tied up within Lucene. I've included a couple of examples from the dumps for you to look at. "ExecuteThread: '20' for queue: 'weblogic.kernel.Default'" daemon prio=5 tid=0xc bee60 nid=0x22 waiting for monitor entry [8c2fd000..8c2ffc24] at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:127 ) - waiting to lock <be2641c8> (a org.apache.lucene.store.FSDirectory) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:101 ) at org.apache.lucene.index.IndexReader.open(IndexReader.java:91) at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:75) at com.uptima.usi.searchAPI.Search.performSearch(Unknown Source) "ExecuteThread: '15' for queue: 'weblogic.kernel.Default'" daemon prio=5 tid=0x7 d1b58 nid=0x1d waiting on condition [8c7fd000..8c7ffc24] at java.lang.Thread.sleep(Native Method) at org.apache.lucene.store.Lock$With.run(Lock.java:109) at org.apache.lucene.index.IndexReader.open(IndexReader.java:103) - locked <c895c5a8> (a org.apache.lucene.store.FSDirectory) at org.apache.lucene.index.IndexReader.open(IndexReader.java:91) at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:75) at com.uptima.usi.searchAPI.Search.performSearch(Unknown Source) These are just examples; each thread dump typically has 5-10 threads tied up in this way. Obviously code which is doing a Thread.sleep on the server side is a bit worrying! Therefore we dug in a bit more...... Long answer - theres a heap of horrible, horrible code in the FSDirectory that tries to be clever and I think its not quite working correctly. Two types of lock - write.lock and commit.lock. The write.lock is used exclusively for synchronising the indexing of documents and has *no* impact on searching whatsoever. Commit.lock is another little story. Commit.lock is used for two things - stopping indexing processes from overwriting segments that another one is currently using, and stopping IndexReaders from overwriting each other when they delete entries (dcon't even start asking my why a bloody IndexReader can delete documents). *However*, theres another naughty little usage that isn't listed in any of the documentation, and here it is.... Doug Cutting wrote FSDirectory in such a way that it caches a directory. Hence, if FSDirectory is called more than once with the same directory, the FSDirectory class uses a static Hashtable to return the current values. However, if FSDirectory is called with a *different* directory, it engages a commit.lock while it updates the values. It *also* makes that Hashtable (sychronised). Creating an IndexSearcher creates (within itself) an IndexReader to read the index. The first thing the IndexReader does is grab an FSDirectory for the index directory - if you are using LUCENE with a single index, theres is never a problem - it is read once, then cached. Our search process works by searching across all the libraries selected sequentially, building a results list and then culling the results it doesn't need. To search it loops through each library and creates an IndexSearcher to get at the data. Starting to see the issue yet? Because each library is in a different directory, the internal call to the IndexReader which then gets an FSDirectory causes the FSDirectory to update its singular cache. Which forces a commit.lock to appear. Doug Cuttings little bit of 'neat' code for caching singularily the data within an FSDirectory is causing us headaches immense. The code is horrible: /** Returns an IndexReader reading the index in the given Directory. */ public static IndexReader open(final Directory directory) throws IOException{ synchronized (directory) { // in- & inter-process sync return (IndexReader)new Lock.With( directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), IndexWriter.COMMIT_LOCK_TIMEOUT) { public Object doBody() throws IOException { SegmentInfos infos = new SegmentInfos(); infos.read(directory); if (infos.size() == 1) { // index is optimized return new SegmentReader(infos, infos.info(0), true); } else { SegmentReader[] readers = new SegmentReader[infos.size()]; for (int i = 0; i < infos.size(); i++) readers[i] = new SegmentReader(infos, infos.info(i), i==infos.size()-1); return new SegmentsReader(infos, directory, readers); } } }.run(); } } Where directory is passed in from the constructor to IndexReader thus: return open( FSDirectory.getDirectory( path, false ) ); I don't know what the reasoning was with the use of the IndexWriter Timeouts when creating the FSDirectory stuff, *AND* the fact it synchronises it all around the thing as well - but it hurts when you have multiple indexes. All of this would go away if the FSDirectory didn't maintain a cache. But it does. Disabling the locks is both a good and a fundamentally bad idea. Good - it would wipe this problem. Bad - it would suppress ALL locks on the system. I *think* we could get around this by using another System property such as 'disableLocksSoTheFSDirectoryCacheWorks'. Or something cleaner. For our system I was thinking of having a system property that allows us to turn on/off the commit.lock around FSDirectory cache creation - but would obviously like it included in the core Lucene and hence thought that it was a worthwhile candidate for Lucene 2. Sorry to get verbose...... Cheers Pete Lewis ----- Original Message ----- From: "Christoph Goller" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Monday, September 13, 2004 9:26 AM Subject: Re: Lock handling and Lucene 1.9 / 2.0 > Pete Lewis wrote: > > Hi all > > > > IndexReader has to obtain a transitory exclusive readlock on a library. This is > > fine, and results in the short lived commit.lock file. However, if multiple > > instantiations of LUCENE IndexReaders are used over a *single* shared library > > source (multiple libraries, single root) a spin can occur where multiple > > IndexReaders sit in 1 second waits. This can be addressed by removing the need for > > an exclusive readlock in the IndexReader - is this to be addressed for 1.4/1.9? > > Hi Pete, > > I do not understand the problem you are describing. > What do you mean by a spin? > > The only problem I currently see is that if you open multiple > readers at the same time and if opening takes a long time you > could get a timeout IOException for some of the readers. > > Note that the short living commit lock is further used to > commit changes to an index with either an IndexReader or > an IndexWriter. Therefore I think it has to be exclusive. > > Christoph > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >