[jira] Updated: (LUCENE-1011) Two or more writers over NFS can cause index corruption

Michael McCandless (JIRA) Sun, 30 Sep 2007 04:12:53 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-1011:
---------------------------------------

    Attachment: LUCENE-1011.patch

Attaching patch.  All tests pass and I think this is ready for
commit.  I'll wait a few days.

What's always tricky about debugging this kind of issue is figuring
out if it's a locking problem (two writers are incorrectly getting the
write lock at the same time), or if it's a IO "stale cache" issue.

To help with this, I created some basic instrumentation to "verify"
that locking is functioning correctly:

  * A new LockFactory called VerifyingLockFactory, which just wraps a
    pre-existing LockFactory and every time a lock is obtained or
    released, contacts the LockVerifyServer (over a socket) to verify
    the lock is not held by another process.  If it is held by another
    process, meaning the LockFactory is broken, an exception is
    thrown.

  * LockVerifyServer.java (main) which just runs forever, accepting &
    verifying these socket connections.

  * A standalone (main) LockStressTest.java, whose sole purpose is to
    obtain/release a specified lock file, very frequently.  You run
    this on multiple machines, pointing to the same lock file, to
    verify your LockFactory is working correctly.

Using these additions, one can stress test their locking in their
particular environment to determine whether their LockFactory is
working properly.

I plan on committing these three source files so that others can
diagnose locking issues using the Lucene core jar.


> Two or more writers over NFS can cause index corruption
> -------------------------------------------------------
>
>                 Key: LUCENE-1011
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1011
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2, 2.3, 2.4, 2.9
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-1011.patch
>
>
> When an index is used over NFS, and, more than one machine can be a
> writer such that they swap roles quickly, it's possible for the index
> to become corrupt if the NFS client directory cache is stale.
> Not all NFS clients will show this.  Very recent versions of Linux's
> NFS client do not seem to show the issue, yet, slightly older ones do,
> and the latest Mac OS X one does as well.
> I've been working with Patrick Kimber, who provided a standalone test
> showing the problem (thank you Patrick!).  This came out of this
> thread:
>   
> http://www.gossamer-threads.com/lists/engine?do=post_view_flat;post=50680;page=1;sb=post_latest_reply;so=ASC;mh=25;list=lucene
> Note that the first issue in that discussion has been resolved
> (LUCENE-948).  This is a new issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1011) Two or more writers over NFS can cause index corruption

Reply via email to