Re: Lock handling and Lucene 1.9 / 2.0

Pete Lewis Tue, 14 Sep 2004 02:36:07 -0700

Hi Christoph

If we stand back a second and ask why we have commit locks when searching?


The answer comes down to handling the FSDirectory - where the methods used
are not j2ee compliant.

We could open another can of worms and say why does the indexreader delete -
but I won't go into that argument again here.....

The bottom line is that we need the ability to search without waiting on a
commit lock.  The FSDirectory is where the problems lie.  We could hack the
code to make it work for our particular application; however what I've been
trying to get across is the need to have a method that will give users the
capability to just search (not delete) without waiting upon the commit lock,
that will be j2ee compliant, and that will be appropriate clustered
implementations - and that this should be a candidate for Lucene 1.9 / 2.0.

You say that it shouldn't take long to wait.  A 1 sec spin lock per index
per process is an eternity when trying to scale for potentially thousands of
users.

Cheers

Pete Lewis

----- Original Message ----- 
From: "Christoph Goller" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Sent: Tuesday, September 14, 2004 8:57 AM
Subject: Re: Lock handling and Lucene 1.9 / 2.0


> Pete Lewis wrote:
> > Hi Christoph
> >
> > The directory caching is applied *across* class instances (the directory
> > is instanced once) - this cache exists singularily and is updated if the
> > FSDirectory is called against a different index.
>
> Yes. That's how we guarantee that every process has at most one
FSDirectory
> instance for every index.
>
> >
> > Multiple indexes will *always* cause directory caching upon calls to
> > FSDirectory - our searches are made sequentially against all libraries
> > (or a selection of libraries) and this sequential call to FSDirectory
> > causes the cache to be updated - its very, very rare that the cache will
> > remain the same between two calls to get FSDirectories. This caching
> > *is* synchronized using the commit.lock (see the code) and two processes
> >   will attain two different caches (completely
> > separate) *but* are tied together by the commit lock. This is what
> > causes the spin.
>
> As I said, synchronization on directory instances is the in-process
> synchronization mechanism. The commit.lock mechanism is the inter-process
> synchronization mechanism. If you use one process (independent JVM) for
> more than one search, you will get hits in the directories cache.
>
> The in-process mechanism is the first synchronization made when
> opening an IndexReader. Obviously you need a directory instance to
> synchronize on, and this instance has to be unique for your process.
> In order to get the directory instance we synchronize on the static
> directory-cache and this may be a bottleneck since all opening
> threads independent of the index they are trying to open have to
> synchronize on the static cache. However, accessing the hashtable should
> be fast, shouln't it? In order to get the directory instance you do not
> need a commit.lock. That's what I meant by:
>
> >>FSDirectory.getDirectory has nothing to do with a commit.lock!
> >
> >
> > Err, wrong. The directory.makeLock(IndexWriter.COMMIT_LOCK_NAME) call
> > from within the IndexReader.open routine ties the commit.lock to the
> > FSDirectory by synchronising the code around a *static* instance of the
> > directory object (see the code!!).
>
> After synchronizing all threads of one process that try to open the same
> index (on the directory), the inter-process mechanism with
> the commit.lock is applied in order to synchronize with other processes
> that might try to open the same index.
>
> So there are 3 places where synchronization is done. Could you please
again
> tell me what in your opinion is the most critical and do you have any
ideas
> how we could improve synchroniztion?
>
> One of your ideas was to turn off the commit.lock mechanism. However, I
think
> we cannot give up inter-process synchronization....
>
> Furthermore, what are the implications of these synchronization problems
for
> your application. Do they just make application start-up slow, or do they
> slow down every search? This is of course about reusing processes and
searcher
> instances for more than one query/search. Everything else is simply using
> Lucene in the wrong way.
>
> regards,
> Christoph
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lock handling and Lucene 1.9 / 2.0

Reply via email to