Re: Lock handling and Lucene 1.9 / 2.0

Pete Lewis Mon, 13 Sep 2004 02:27:41 -0700

Hi Christoph

We are running a cluster of 4 multi-processor Sun servers with Bea Weblogic.  We are 
using Lucene for the search component and have multiple indexes on a SAN, where all 
indexes are accessible from all of the servers in the cluster.

During performance testing we found that  Lucene seemed to be taking a lot of 
resources. When the system was "stressed" we did a number of thread dumps; the system 
appears to have most threads that are doing work tied up within Lucene. I've included 
a couple of examples from the dumps for you to look at.

"ExecuteThread: '20' for queue: 'weblogic.kernel.Default'" daemon prio=5 tid=0xc
bee60 nid=0x22 waiting for monitor entry [8c2fd000..8c2ffc24]
        at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:127
)
        - waiting to lock <be2641c8> (a org.apache.lucene.store.FSDirectory)
        at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:101
)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:91)
        at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:75)
        at com.uptima.usi.searchAPI.Search.performSearch(Unknown Source)

"ExecuteThread: '15' for queue: 'weblogic.kernel.Default'" daemon prio=5 tid=0x7
d1b58 nid=0x1d waiting on condition [8c7fd000..8c7ffc24]
        at java.lang.Thread.sleep(Native Method)
        at org.apache.lucene.store.Lock$With.run(Lock.java:109)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:103)
        - locked <c895c5a8> (a org.apache.lucene.store.FSDirectory)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:91)
        at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:75)
        at com.uptima.usi.searchAPI.Search.performSearch(Unknown Source)

These are just examples; each thread dump typically has 5-10 threads tied up in this 
way. Obviously code which is doing a Thread.sleep on the server side is a bit worrying!

Therefore we dug in a bit more......

Long answer - theres a heap of horrible, horrible code in the FSDirectory that tries 
to be clever and I think its not quite working correctly. 

Two types of lock - write.lock and commit.lock. The write.lock is used exclusively for 
synchronising the indexing of documents and has *no* impact on searching whatsoever.

Commit.lock is another little story. Commit.lock is used for two things - stopping 
indexing processes from overwriting segments that another one is currently using, and 
stopping IndexReaders from overwriting each other when they delete entries (dcon't 
even start asking my why a bloody IndexReader can delete documents).

*However*, theres another naughty little usage that isn't listed in any of the 
documentation, and here it is....

Doug Cutting wrote FSDirectory in such a way that it caches a directory. Hence, if 
FSDirectory is called more than once with the same directory, the FSDirectory class 
uses a static Hashtable to return the current values. However, if FSDirectory is 
called with a *different* directory, it engages a commit.lock while it updates the 
values. It *also* makes that Hashtable (sychronised). 

Creating an IndexSearcher creates (within itself) an IndexReader to read the index. 
The first thing the IndexReader does is grab an FSDirectory for the index directory - 
if you are using LUCENE with a single index, theres is never a problem - it is read 
once, then cached.

Our search process works by searching across all the libraries selected sequentially, 
building a results list and then culling the results it doesn't need. To search it 
loops through each library and creates an IndexSearcher to get at the data.

Starting to see the issue yet? Because each library is in a different directory, the 
internal call to the IndexReader which then gets an FSDirectory causes the FSDirectory 
to update its singular cache. Which forces a commit.lock to appear.

Doug Cuttings little bit of 'neat' code for caching singularily the data within an 
FSDirectory is causing us headaches immense. The code is horrible:

/** Returns an IndexReader reading the index in the given Directory. */
  public static IndexReader open(final Directory directory) throws IOException{
    synchronized (directory) {     // in- & inter-process sync
      return (IndexReader)new Lock.With(
          directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
          IndexWriter.COMMIT_LOCK_TIMEOUT) {
          public Object doBody() throws IOException {
            SegmentInfos infos = new SegmentInfos();
            infos.read(directory);
            if (infos.size() == 1) {    // index is optimized
              return new SegmentReader(infos, infos.info(0), true);
            } else {
                SegmentReader[] readers = new SegmentReader[infos.size()];
                for (int i = 0; i < infos.size(); i++)
                  readers[i] = new SegmentReader(infos, infos.info(i), 
i==infos.size()-1);
                return new SegmentsReader(infos, directory, readers);
            }
          }
        }.run();
    }
  }

Where directory is passed in from the constructor to IndexReader thus:

  return open( FSDirectory.getDirectory( path, false ) );

I don't know what the reasoning was with the use of the IndexWriter Timeouts when 
creating the FSDirectory stuff, *AND* the fact it synchronises it all around the thing 
as well - but it hurts when you have multiple indexes.

All of this would go away if the FSDirectory didn't maintain a cache. But it does.

Disabling the locks is both a good and a fundamentally bad idea. Good - it would wipe 
this problem. Bad - it would suppress ALL locks on the system. I *think* we could get 
around this by using another System property such as 
'disableLocksSoTheFSDirectoryCacheWorks'. Or something cleaner. 

For our system I was thinking of having a system property that allows us to turn 
on/off the commit.lock around FSDirectory cache creation - but would obviously like it 
included in the core Lucene and hence thought that it was a worthwhile candidate for 
Lucene 2.

Sorry to get verbose......

Cheers

Pete Lewis

----- Original Message ----- 
From: "Christoph Goller" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Sent: Monday, September 13, 2004 9:26 AM
Subject: Re: Lock handling and Lucene 1.9 / 2.0

> Pete Lewis wrote:
> > Hi all
> > 
> > IndexReader has to obtain a transitory exclusive readlock on a library. This is 
> > fine, and results in the short lived commit.lock file. However, if multiple 
> > instantiations of LUCENE IndexReaders are used over a *single* shared library 
> > source (multiple libraries, single root) a spin can occur where multiple 
> > IndexReaders sit in 1 second waits. This can be addressed by removing the need for 
> > an exclusive readlock in the IndexReader - is this to be addressed for 1.4/1.9?
> 
> Hi Pete,
> 
> I do not understand the problem you are describing.
> What do you mean by a spin?
> 
> The only problem I currently see is that if you open multiple
> readers at the same time and if opening takes a long time you
> could get a timeout IOException for some of the readers.
> 
> Note that the short living commit lock is further used to
> commit changes to an index with either an IndexReader or
> an IndexWriter. Therefore I think it has to be exclusive.
> 
> Christoph
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

Re: Lock handling and Lucene 1.9 / 2.0

Reply via email to