Nice!

Mike

On Wed, Apr 28, 2010 at 12:54 PM, Robert Muir <[email protected]> wrote:
> As far as the build system goes, I implemented the two ideas mentioned
> earlier in this message (not creating a new Formatter for each test, and not
> spawning 26 jvms for each batch)
>
> Jira is down, but if you want to help test you can try a patch here:
> http://pastebin.com/iqwb73H2 (click Raw/Download)
>
> Additionally this cuts 1:20 off the total Solrcene 'ant clean test' for me.
>
> before:
> BUILD SUCCESSFUL
> Total time: 7 minutes 42 seconds
>
> after:
> BUILD SUCCESSFUL
> Total time: 6 minutes 23 seconds
>
> On Wed, Apr 28, 2010 at 12:25 PM, Michael McCandless
> <[email protected]> wrote:
>>
>> I think this are good changes to NativeFSLockFactory.
>>
>> But: the chances that N JVMs launched at once would conflict on the
>> randomly generated lock file name should be miniscule... though it
>> does depend on how good new Random() is at seeding itself.  Do we
>> really think this explains your exceptions Shai?  (And, if so, even w/
>> these changes, the conflict could still happen?)  Maybe we should
>> explicitly seed it?
>>
>> Mike
>>
>> On Wed, Apr 28, 2010 at 11:22 AM, Shai Erera <[email protected]> wrote:
>> > I'd like to summarize the IRC discussion Mark and I had:
>> >
>> > The lock file's existence in the directory should not fail obtain() from
>> > retrieving obtaining a lock. That's the whole difference between Simple
>> > and
>> > Native. So we should make a best-effort to delete it. If the delete
>> > fails on
>> > release(), then ok. On obtain(), we won't return false if the lock
>> > exists,
>> > but attempt to really obtain it and fail appropriately.
>> >
>> > While the previously proposed fix (add "&& path.exists()" to release())
>> > might work most of the times, it will only work "most of the times".
>> > I.e.,
>> > between release() and delete(), an external process, like AntiVirus,
>> > might
>> > lock the file, and delete will fail, but the file will still be there,
>> > and
>> > we'll throw an exception still.
>> >
>> > So, the proposed changes are:
>> > * release() is allowed to fail to delete the lock file.
>> > * obtain() should not return false if the lock file exists - it should
>> > really attempt to obtain it.
>> > * in acquireTestLock(), if after release() is called, the lock file
>> > still
>> > exists, we'll retry the delete few ms later, and if that fails, call
>> > deleteOnExit.
>> >
>> > How's that sound?
>> >
>> > Shai
>> >
>> > On Wed, Apr 28, 2010 at 5:58 PM, Mark Miller <[email protected]>
>> > wrote:
>> >>
>> >> I don't follow. The simple lock impl must delete the file, but the
>> >> native
>> >> impl should not have to. The file has nothing to do with the lock - its
>> >> just
>> >> the medium to ask for and release the lock. If it already exists, you
>> >> don't
>> >> have to create it - you can just use it to try and get a native lock.
>> >> Likewise, it doesn't need to be removed to release a native lock - you
>> >> simply call unlock on it.
>> >>
>> >> On 4/28/10 10:34 AM, Shai Erera wrote:
>> >>>
>> >>> But this method is called also for the regular lock file - if
>> >>> release()
>> >>> won't delete the file, then the next l.obtain() will return false.
>> >>>
>> >>> Shai
>> >>>
>> >>> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <[email protected]
>> >>> <mailto:[email protected]>> wrote:
>> >>>
>> >>>    It shouldn't need too though - the native lock file is simply a
>> >>>    dummy file to apply the lock too - shouldn't matter if it already
>> >>>    exists or not (though it seems to in the current code).
>> >>>
>> >>>
>> >>>    On 4/28/10 10:22 AM, Shai Erera wrote:
>> >>>
>> >>>        If you won't delete the file, the next obtain will fail?
>> >>>
>> >>>        On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller
>> >>>        <[email protected] <mailto:[email protected]>
>> >>>        <mailto:[email protected] <mailto:[email protected]>>>
>> >>>        wrote:
>> >>>
>> >>>            I wonder if not being able to delete the file should throw
>> >>> a
>> >>>        release
>> >>>            failed exception at all. You have actually released the
>> >>>        native lock
>> >>>            - you where just not able to clean up - but that seems more
>> >>>        like a
>> >>>            warning situation than a failure.
>> >>>
>> >>>
>> >>>            --
>> >>>            - Mark
>> >>>
>> >>>        http://www.lucidimagination.com
>> >>>
>> >>>            On 4/28/10 9:53 AM, Shai Erera wrote:
>> >>>
>> >>>                I've hit it again and here's the full stacktrace (at
>> >>> least
>> >>>                what's printed):
>> >>>
>> >>>                     [junit] Exception in thread "main"
>> >>>        java.lang.RuntimeException:
>> >>>                Failed to acquire random test lock; please verify
>> >>>        filesystem for
>> >>>                lock
>> >>>                directory
>> >>>        'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock'
>> >>>                supports
>> >>>                locking
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>> >>>                     [junit]     at
>> >>>                java.lang.J9VMInternals.newInstanceImpl(Native Method)
>> >>>                     [junit]     at
>> >>>        java.lang.Class.newInstance(Class.java:1325)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
>> >>>                     [junit] Caused by:
>> >>>                org.apache.lucene.store.LockReleaseFailedException:
>> >>>        failed to delete
>> >>>
>> >>>
>> >>>
>> >>>  C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
>> >>>                     [junit]     at
>> >>>
>> >>>
>> >>>
>> >>>  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
>> >>>                     [junit]     ... 9 more
>> >>>
>> >>>                The exception is thrown from NativeFSLock.release() b/c
>> >>>        it fails to
>> >>>                delete the lock file. I think I know what the problem
>> >>> is
>> >>>        - and
>> >>>                it must
>> >>>                be related to the large number of JVMs that are created
>> >>>        w/ the
>> >>>                parallel
>> >>>                tests:
>> >>>                * Suppose that JVM1 draws the number '1' for the test
>> >>>        lock file - it
>> >>>                thus creates lock1.
>> >>>                * Now suppose that JVM2 draws the same number,
>> >>> magically
>> >>>        somehow
>> >>>                - it
>> >>>                thus creates lock1 as well.
>> >>>                * The code of acquireTestLock in NativeFSLockFactory
>> >>>        looks like
>> >>>                this:
>> >>>                     Lock l = makeLock(randomLockName);
>> >>>                     try {
>> >>>                       l.obtain();
>> >>>                       l.release();
>> >>>                --> both will create the same test Lock file. Then
>> >>>        l.obtain()
>> >>>                probably
>> >>>                returns false for one of them, but it's not checked.
>> >>>                * Then in release there are a couple of things to note:
>> >>>                1) the method is synced on the instance, which does not
>> >>>        affect
>> >>>                the two JVMs.
>> >>>                2) suppose that both JVMs pass through the if
>> >>> (exists())
>> >>>        check. Then
>> >>>                JVM1 releases the lock, and deletes the file.
>> >>>                3) Now JVM2 kicks in, calls lock.release() which has no
>> >>>        effect
>> >>>                (from the
>> >>>                jdoc: "If this lock object is invalid then invoking
>> >>> this
>> >>>        method
>> >>>                has no
>> >>>                effect." ). Then when it comes to path.delete(), the
>> >>>        file isn't
>> >>>                there,
>> >>>                the method returns false and thus an exception is
>> >>> thrown
>> >>> ...
>> >>>
>> >>>                This situation is extremely unlikely to happen, but
>> >>>        still, it
>> >>>                happens on
>> >>>                my machine quite frequently since the parallel tests.
>> >>> I'm
>> >>>                thinking that
>> >>>                acquireTestLock should be less strict, but perhaps we
>> >>>        can fix it
>> >>>                if we
>> >>>                replace the line:
>> >>>                      if (!path.delete()) (line 310)
>> >>>                with this
>> >>>                      if (!path.delete() && path.exists())
>> >>>
>> >>>                I.e., if the lock file fails to delete but is still
>> >>>        there, throw the
>> >>>                exception ...
>> >>>
>> >>>                What do you think?
>> >>>
>> >>>                Shai
>> >>>
>> >>>                On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir
>> >>>        <[email protected] <mailto:[email protected]>
>> >>>        <mailto:[email protected] <mailto:[email protected]>>
>> >>>        <mailto:[email protected] <mailto:[email protected]>
>> >>>        <mailto:[email protected] <mailto:[email protected]>>>> wrote:
>> >>>
>> >>>
>> >>>
>> >>>                    On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda
>> >>>        <[email protected] <mailto:[email protected]>
>> >>>        <mailto:[email protected]
>> >>> <mailto:[email protected]>>
>> >>>        <mailto:[email protected]
>> >>> <mailto:[email protected]>
>> >>>
>> >>>        <mailto:[email protected]
>> >>>        <mailto:[email protected]>>>> wrote:
>> >>>
>> >>>
>> >>>                        I've had similar random failures on Mac OS X
>> >>>        10.6. They
>> >>>                started
>> >>>                        happening recently, about two weeks ago.
>> >>>
>> >>>
>> >>>                    Thats just too randomly close to when i last worked
>> >>>        on this
>> >>>                build
>> >>>                    system stuff for LUCENE-1709... perhaps I made it
>> >>> worse
>> >>>                instead of
>> >>>                    better.
>> >>>
>> >>>                    --
>> >>>                    Robert Muir
>> >>>        [email protected] <mailto:[email protected]>
>> >>>        <mailto:[email protected] <mailto:[email protected]>>
>> >>>        <mailto:[email protected] <mailto:[email protected]>
>> >>>        <mailto:[email protected] <mailto:[email protected]>>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>  ---------------------------------------------------------------------
>> >>>            To unsubscribe, e-mail: [email protected]
>> >>>        <mailto:[email protected]>
>> >>>        <mailto:[email protected]
>> >>>        <mailto:[email protected]>>
>> >>>
>> >>>            For additional commands, e-mail: [email protected]
>> >>>        <mailto:[email protected]>
>> >>>        <mailto:[email protected]
>> >>>        <mailto:[email protected]>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>    --
>> >>>    - Mark
>> >>>
>> >>>    http://www.lucidimagination.com
>> >>>
>> >>>
>> >>>  ---------------------------------------------------------------------
>> >>>    To unsubscribe, e-mail: [email protected]
>> >>>    <mailto:[email protected]>
>> >>>    For additional commands, e-mail: [email protected]
>> >>>    <mailto:[email protected]>
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> - Mark
>> >>
>> >> http://www.lucidimagination.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>
>
> --
> Robert Muir
> [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to