On 25 December 2011 21:54, Peter Firmstone <j...@zeus.net.au> wrote:
> Dan Creswell wrote:
>>>>
>>>> Where "bug" is potentially a swallowed exception.
>>>>
>>>> Reggie for the most part holds a writeLock for any significant
>>>> invocation from the client and events are dispatched via a
>>>> TaskManager. Seems like an unlikely source for problems. Am I right in
>>>> assuming the test itself is single-threaded? Sure looks like it is?
>>>>
>>>> That would suggest to me a problem in the remote comms layer if
>>>> anywhere. My biggest worry would be that your permissions work is
>>>> leading to a security exception or similar and it's just being
>>>> swallowed.
>>>>
>>>> Take that worry and combine it with the fact that the last failing
>>>> test does throw an exception and doesn't touch remote services or
>>>> indeed the JERI layer, I'd say this is the test to look at first and
>>>> perhaps we're not looking at an additional bug.
>>>>
>>>>
>>>>
>>>
>>> Nope, it's a concurrency bug, TaskManager would create task threads for
>>> the
>>> events, the bug goes away when I activate security debug, there are no
>>> permission failures.
>>>
>>>
>>
>>
>> Sure, it's a concurrency bug I just happen to think all the symptoms
>> point at the new code being the culprit. Activating all that security
>> debug reduces interleaving and such reducing the chance of e.g. a race
>> condition occurring.

I did a quick review of your code last night, didn't see anything
immediately worrying however I did notice the behaviour of elements()
has changed. I couldn't immediately think of a scenario where that
would be the problem but, worth thinking a bit more about.

>>
>
>
> Well, it's not livelock, it's not deadlock, although in one test I can
> induce deadlock (no progress, no cpu load) with the debugger, so it isn't a
> loop, or CAS, it's some kind of synchronization deadlock.

Mmmm, trouble is the debugger itself (or at least its agent) could be
the problem - they aren't perfect devices, typically.

>
> Considering I have trouble printing the ProtectionDomain in the debugger,
> could this be a stale reference, or is there no relation?

Mightn't be stale, could be corrupted as the result of some
broken/loose locking somewhere. That might relate to your deadlock
above (did you get a full thread dump from the JVM?) as it mightn't be
a deadlock but a corruption that brings everything to a halt.

>
>
>> And the fact that there are no lost events unless your code is present
>> can cut both ways but the simplest explanation would be a bug in your
>> code not a bug in TaskManager which has virtually nothing to do with
>> security.
>>
>
> If there's a concurrency issue in my code, it's in DynamicPolicyProvider or
> one of the classes it uses, not ConcurrentPolicyFile, since this bug still
> occurs when I replace it with Sun's implementation.

Okay, so that helps us zone in a little...

>
> Peter.
>

Reply via email to