----- Original message ----- > On 25 December 2011 21:54, Peter Firmstone <j...@zeus.net.au> wrote: > > Dan Creswell wrote: > > > > > > > > > > Where "bug" is potentially a swallowed exception. > > > > > > > > > > Reggie for the most part holds a writeLock for any significant > > > > > invocation from the client and events are dispatched via a > > > > > TaskManager. Seems like an unlikely source for problems. Am I right in > > > > > assuming the test itself is single-threaded? Sure looks like it is? > > > > > > > > > > That would suggest to me a problem in the remote comms layer if > > > > > anywhere. My biggest worry would be that your permissions work is > > > > > leading to a security exception or similar and it's just being > > > > > swallowed. > > > > > > > > > > Take that worry and combine it with the fact that the last failing > > > > > test does throw an exception and doesn't touch remote services or > > > > > indeed the JERI layer, I'd say this is the test to look at first and > > > > > perhaps we're not looking at an additional bug. > > > > > > > > > > > > > > > > > > > > > > > Nope, it's a concurrency bug, TaskManager would create task threads for > > > > the > > > > events, the bug goes away when I activate security debug, there are no > > > > permission failures. > > > > > > > > > > > > > > > > > Sure, it's a concurrency bug I just happen to think all the symptoms > > > point at the new code being the culprit. Activating all that security > > > debug reduces interleaving and such reducing the chance of e.g. a race > > > condition occurring. > > I did a quick review of your code last night,
Thanks =) didn't see anything > immediately worrying however I did notice the behaviour of elements() > has changed. I couldn't immediately think of a scenario where that > would be the problem but, worth thinking a bit more about. > The DynamicPolicyProvider did originally add (mutate after publishing) to the ConcurrentPermissions collections after they had been added to the ConcurrentMap cache, but is now only mutated before publishing, replacing if necessary after, it avoids ConcurrentModificationException, but also reduces the possibility of missing Permissions. That's the theory anyway. I did find some problems with MergedPolicyProvider, which is part of the qa suite, this has stabilised the debugger, but hasn't fixed the lost events, I'll commit it soon. The GetContextTest failure is a separate bug, it goes away with the sun policy provider, I'm working on it at present. Cheers, Peter. > > > > > > > > > Well, it's not livelock, it's not deadlock, although in one test I can > > induce deadlock (no progress, no cpu load) with the debugger, so it isn't a > > loop, or CAS, it's some kind of synchronization deadlock. > > Mmmm, trouble is the debugger itself (or at least its agent) could be > the problem - they aren't perfect devices, typically. > > > > > Considering I have trouble printing the ProtectionDomain in the debugger, > > could this be a stale reference, or is there no relation? > > Mightn't be stale, could be corrupted as the result of some > broken/loose locking somewhere. That might relate to your deadlock > above (did you get a full thread dump from the JVM?) as it mightn't be > a deadlock but a corruption that brings everything to a halt. > > > > > > > > And the fact that there are no lost events unless your code is present > > > can cut both ways but the simplest explanation would be a bug in your > > > code not a bug in TaskManager which has virtually nothing to do with > > > security. > > > > > > > If there's a concurrency issue in my code, it's in DynamicPolicyProvider or > > one of the classes it uses, not ConcurrentPolicyFile, since this bug still > > occurs when I replace it with Sun's implementation. > > Okay, so that helps us zone in a little... > > > > > Peter. > >