On 25 December 2011 21:54, Peter Firmstone <j...@zeus.net.au> wrote: > Dan Creswell wrote: >>>> >>>> Where "bug" is potentially a swallowed exception. >>>> >>>> Reggie for the most part holds a writeLock for any significant >>>> invocation from the client and events are dispatched via a >>>> TaskManager. Seems like an unlikely source for problems. Am I right in >>>> assuming the test itself is single-threaded? Sure looks like it is? >>>> >>>> That would suggest to me a problem in the remote comms layer if >>>> anywhere. My biggest worry would be that your permissions work is >>>> leading to a security exception or similar and it's just being >>>> swallowed. >>>> >>>> Take that worry and combine it with the fact that the last failing >>>> test does throw an exception and doesn't touch remote services or >>>> indeed the JERI layer, I'd say this is the test to look at first and >>>> perhaps we're not looking at an additional bug. >>>> >>>> >>>> >>> >>> Nope, it's a concurrency bug, TaskManager would create task threads for >>> the >>> events, the bug goes away when I activate security debug, there are no >>> permission failures. >>> >>> >> >> >> Sure, it's a concurrency bug I just happen to think all the symptoms >> point at the new code being the culprit. Activating all that security >> debug reduces interleaving and such reducing the chance of e.g. a race >> condition occurring.
I did a quick review of your code last night, didn't see anything immediately worrying however I did notice the behaviour of elements() has changed. I couldn't immediately think of a scenario where that would be the problem but, worth thinking a bit more about. >> > > > Well, it's not livelock, it's not deadlock, although in one test I can > induce deadlock (no progress, no cpu load) with the debugger, so it isn't a > loop, or CAS, it's some kind of synchronization deadlock. Mmmm, trouble is the debugger itself (or at least its agent) could be the problem - they aren't perfect devices, typically. > > Considering I have trouble printing the ProtectionDomain in the debugger, > could this be a stale reference, or is there no relation? Mightn't be stale, could be corrupted as the result of some broken/loose locking somewhere. That might relate to your deadlock above (did you get a full thread dump from the JVM?) as it mightn't be a deadlock but a corruption that brings everything to a halt. > > >> And the fact that there are no lost events unless your code is present >> can cut both ways but the simplest explanation would be a bug in your >> code not a bug in TaskManager which has virtually nothing to do with >> security. >> > > If there's a concurrency issue in my code, it's in DynamicPolicyProvider or > one of the classes it uses, not ConcurrentPolicyFile, since this bug still > occurs when I replace it with Sun's implementation. Okay, so that helps us zone in a little... > > Peter. >