Re: LuceneJUnitResultFormatter sometimes fails to lock

Shai Erera Wed, 28 Apr 2010 10:24:39 -0700

Mike - I think I'm pretty sure that's what happened. The reason is
that even w/ the reported failure, the lock dir is empty when the
tests finish and the lock file isn't there. I believe that if the
collision was not the case, then I should have seen the test lock file
in there?


So overall these changes will 99.9% of the time delete the lock file.
It's in those (I agree) super rare cases that the rest of the code
will be invoked.

In addition, I don't have any other explanation to why this sometimes
happens, all started after the tests parallelism. And since then it
happened too many times ...

Shai

On Wednesday, April 28, 2010, Michael McCandless
<[email protected]> wrote:
> Nice!
>
> Mike
>
> On Wed, Apr 28, 2010 at 12:54 PM, Robert Muir <[email protected]> wrote:
>> As far as the build system goes, I implemented the two ideas mentioned
>> earlier in this message (not creating a new Formatter for each test, and not
>> spawning 26 jvms for each batch)
>>
>> Jira is down, but if you want to help test you can try a patch here:
>> http://pastebin.com/iqwb73H2 (click Raw/Download)
>>
>> Additionally this cuts 1:20 off the total Solrcene 'ant clean test' for me.
>>
>> before:
>> BUILD SUCCESSFUL
>> Total time: 7 minutes 42 seconds
>>
>> after:
>> BUILD SUCCESSFUL
>> Total time: 6 minutes 23 seconds
>>
>> On Wed, Apr 28, 2010 at 12:25 PM, Michael McCandless
>> <[email protected]> wrote:
>>>
>>> I think this are good changes to NativeFSLockFactory.
>>>
>>> But: the chances that N JVMs launched at once would conflict on the
>>> randomly generated lock file name should be miniscule... though it
>>> does depend on how good new Random() is at seeding itself.  Do we
>>> really think this explains your exceptions Shai?  (And, if so, even w/
>>> these changes, the conflict could still happen?)  Maybe we should
>>> explicitly seed it?
>>>
>>> Mike
>>>
>>> On Wed, Apr 28, 2010 at 11:22 AM, Shai Erera <[email protected]> wrote:
>>> > I'd like to summarize the IRC discussion Mark and I had:
>>> >
>>> > The lock file's existence in the directory should not fail obtain() from
>>> > retrieving obtaining a lock. That's the whole difference between Simple
>>> > and
>>> > Native. So we should make a best-effort to delete it. If the delete
>>> > fails on
>>> > release(), then ok. On obtain(), we won't return false if the lock
>>> > exists,
>>> > but attempt to really obtain it and fail appropriately.
>>> >
>>> > While the previously proposed fix (add "&& path.exists()" to release())
>>> > might work most of the times, it will only work "most of the times".
>>> > I.e.,
>>> > between release() and delete(), an external process, like AntiVirus,
>>> > might
>>> > lock the file, and delete will fail, but the file will still be there,
>>> > and
>>> > we'll throw an exception still.
>>> >
>>> > So, the proposed changes are:
>>> > * release() is allowed to fail to delete the lock file.
>>> > * obtain() should not return false if the lock file exists - it should
>>> > really attempt to obtain it.
>>> > * in acquireTestLock(), if after release() is called, the lock file
>>> > still
>>> > exists, we'll retry the delete few ms later, and if that fails, call
>>> > deleteOnExit.
>>> >
>>> > How's that sound?
>>> >
>>> > Shai
>>> >
>>> > On Wed, Apr 28, 2010 at 5:58 PM, Mark Miller <[email protected]>
>>> > wrote:
>>> >>
>>> >> I don't follow. The simple lock impl must delete the file, but the
>>> >> native
>>> >> impl should not have to. The file has nothing to do with the lock - its
>>> >> just
>>> >> the medium to ask for and release the lock. If it already exists, you
>>> >> don't
>>> >> have to create it - you can just use it to try and get a native lock.
>>> >> Likewise, it doesn't need to be removed to release a native lock - you
>>> >> simply call unlock on it.
>>> >>
>>> >> On 4/28/10 10:34 AM, Shai Erera wrote:
>>> >>>
>>> >>> But this method is called also for the regular lock file - if
>>> >>> release()
>>> >>> won't delete the file, then the next l.obtain() will return false.
>>> >>>
>>> >>> Shai
>>> >>>
>>> >>> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <[email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: LuceneJUnitResultFormatter sometimes fails to lock

Reply via email to