But this method is called also for the regular lock file - if release() won't delete the file, then the next l.obtain() will return false.
Shai On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <[email protected]> wrote: > It shouldn't need too though - the native lock file is simply a dummy file > to apply the lock too - shouldn't matter if it already exists or not (though > it seems to in the current code). > > > On 4/28/10 10:22 AM, Shai Erera wrote: > >> If you won't delete the file, the next obtain will fail? >> >> On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller <[email protected] >> <mailto:[email protected]>> wrote: >> >> I wonder if not being able to delete the file should throw a release >> failed exception at all. You have actually released the native lock >> - you where just not able to clean up - but that seems more like a >> warning situation than a failure. >> >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> On 4/28/10 9:53 AM, Shai Erera wrote: >> >> I've hit it again and here's the full stacktrace (at least >> what's printed): >> >> [junit] Exception in thread "main" java.lang.RuntimeException: >> Failed to acquire random test lock; please verify filesystem for >> lock >> directory 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' >> supports >> locking >> [junit] at >> >> >> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88) >> [junit] at >> >> >> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127) >> [junit] at >> >> >> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74) >> [junit] at >> java.lang.J9VMInternals.newInstanceImpl(Native Method) >> [junit] at java.lang.Class.newInstance(Class.java:1325) >> [junit] at >> >> >> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248) >> [junit] at >> >> >> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214) >> [junit] at >> >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819) >> [junit] at >> >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909) >> [junit] at >> >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743) >> [junit] Caused by: >> org.apache.lucene.store.LockReleaseFailedException: failed to >> delete >> >> C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock >> [junit] at >> >> org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311) >> [junit] at >> >> >> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86) >> [junit] ... 9 more >> >> The exception is thrown from NativeFSLock.release() b/c it fails to >> delete the lock file. I think I know what the problem is - and >> it must >> be related to the large number of JVMs that are created w/ the >> parallel >> tests: >> * Suppose that JVM1 draws the number '1' for the test lock file - >> it >> thus creates lock1. >> * Now suppose that JVM2 draws the same number, magically somehow >> - it >> thus creates lock1 as well. >> * The code of acquireTestLock in NativeFSLockFactory looks like >> this: >> Lock l = makeLock(randomLockName); >> try { >> l.obtain(); >> l.release(); >> --> both will create the same test Lock file. Then l.obtain() >> probably >> returns false for one of them, but it's not checked. >> * Then in release there are a couple of things to note: >> 1) the method is synced on the instance, which does not affect >> the two JVMs. >> 2) suppose that both JVMs pass through the if (exists()) check. >> Then >> JVM1 releases the lock, and deletes the file. >> 3) Now JVM2 kicks in, calls lock.release() which has no effect >> (from the >> jdoc: "If this lock object is invalid then invoking this method >> has no >> effect." ). Then when it comes to path.delete(), the file isn't >> there, >> the method returns false and thus an exception is thrown ... >> >> This situation is extremely unlikely to happen, but still, it >> happens on >> my machine quite frequently since the parallel tests. I'm >> thinking that >> acquireTestLock should be less strict, but perhaps we can fix it >> if we >> replace the line: >> if (!path.delete()) (line 310) >> with this >> if (!path.delete() && path.exists()) >> >> I.e., if the lock file fails to delete but is still there, throw >> the >> exception ... >> >> What do you think? >> >> Shai >> >> On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir <[email protected] >> <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> wrote: >> >> >> >> On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] >> >> <mailto:[email protected]>>> wrote: >> >> >> I've had similar random failures on Mac OS X 10.6. They >> started >> happening recently, about two weeks ago. >> >> >> Thats just too randomly close to when i last worked on this >> build >> system stuff for LUCENE-1709... perhaps I made it worse >> instead of >> better. >> >> -- >> Robert Muir >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> <mailto:[email protected]> >> >> For additional commands, e-mail: [email protected] >> <mailto:[email protected]> >> >> >> > > -- > - Mark > > http://www.lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
