Yeah -- that's right the deadlock is caused by lack of synchronization during "destroy" on the second valueUnbound event. I think (but haven't confirmed yet) that the same thing could happen with different implementations of the ServletContainerAdapter.ensureFailover method if it calls "session.setAttribute(...)". Just a hunch, though, and I need to confirm this.
Back to the deadlock, absence of synchronization on the "destroy" lifecycle call allows a second thread to enter into the ControlBeanContext which can cause the thread to deadlock with a second thread that's already in the ControlBeanContext. > I don't think that's an option here, since it voids the whole purpose of > DeferredSessionStorageHandler > (http://mail-archives.apache.org/mod_mbox/beehive-dev/200601.mbox/[EMAIL > PROTECTED] > ) -- am I understanding you here? Yeah, that makes sense -- just wanted to rule this out as an option. :) > > Otherwise, I wonder if there's a way to track this without adding state > to the page flow instance. Possibly in a request attribute? Just a > thought... Sure -- this would certainly work. It could take the form of a set of attribute names that should be persisted / failed over, but that should ignore lifecycle methods. Will look at this and get back to you. Thanks for the comments... Eddie > > Rich > > Eddie O'Neil wrote: > > Gotcha -- thanks for the info; glad to know I wasn't entirely out in > > left field. :) > > > > Given this and putting the issue of the session expiration aside for > > a second, I think that root cause of the problem is that it is > > possible for the DeferredSessionStorageHandler to run the > > "valueUnbound" method code on a PageFlowController twice in these two > > circumstances: > > > > - DeferredSessionStorageHandler::setAttribute / remove Attribute > > - DeferredSessionStorageHandler::applyChanges @ line 221 > > > > The first runs in "real time" with the removal or replacement of a > > Page Flow in the session and runs inside of the lock, but the second > > is the deferred part that runs at the end of a request to persist > > changed attributes in the session. The "applyChanges" then makes > > calls to set / remove the Page Flow attribute from the real > > HttpSession, and this triggers the "valueUnbound" event and ultimately > > an "onDestroy". This latter step is unsynchronized and thus causes > > the deadlock. > > > > If the Page Flow is stored in the session and is an > > HttpSessionBindingListener, any deferred calls are going to trigger > > this event. In order to keep consecutive JPF invocations in the same > > order, the onDestroy of the previous JPF needs to run before onCreate > > of the next JPF. The cheap way to fix this would be to track the > > state of "valueUnbound" as a private boolean inside of the > > PageFlowController. Then, when the second valueUnbound call is made, > > the event will be a no-op on the JPF. > > > > Another solution would be to tunnel down into the real HttpSession > > and make JPF changes "live" against the session rather than deferring > > them. > > > > Have an opinion either way? > > > > Eddie > > > > > > > > On 4/6/06, Rich Feit <[EMAIL PROTECTED]> wrote: > > > >> Funny, I'd drafted a longish email earlier asking about the BeanContext > >> lock, because it sounded like the *fundamental* issue (ordering problems > >> in locking that and the control bean instance itself). But it's not, or > >> at least it's unrelated. In a way, that's good. :) > >> > >> Regarding this: > >> > >>> Thread A is execution an action on JPF X > >>> Thread B is destroying JPF X en route to running JPF Y > >>> > >>> From the code in PageFlowController.persistInSession (transitively > >>> down to StorageHandler.removeAttribute), it seems like a lock is held > >>> on the "current" JPF that would prevent this. Right? > >>> > >> That's correct. The only time onDestroy() should ever run without a > >> lock on the page flow instance is when the session is expiring. That's > >> the situation where the session itself is already locked, so it's > >> dangerous to subsequently lock the page flow instance (while another > >> thread may be accessing the session through synchronized get() after > >> already locking the page flow instance). The user could also simulate > >> this situation by explicitly removing the page flow session attribute, > >> but of course that's not something we should try to support. > >> > >> Let me know if you see/find any holes in that, though... > >> > >> Rich > >> > >> Eddie O'Neil wrote: > >> > >>> Rich-- > >>> > >>> Having looked into this more now, I've got a little more detail > >>> about what's happening: > >>> > >>> There are two issues intertwined here. The first issue is related > >>> to the ControlBeanContext implementation class and its usage of a > >>> global, VM-wide lock. At issue is the fact that all > >>> ControlContainerContext implementations inherit from a > >>> BeanContextServicesSupport base class. This class (and another in > >>> that hierarchy) use a the VM-wide object > >>> BeanContext.globalHierarchyLock for synchronization in two situations: > >>> > >>> - adding / removing a control to / from a context > >>> - adding / removing / getting a "service" from a context > >>> > >>> This acts as a choke point for all Control references and is what > >>> shows up in the locking traces in a previous mail. This is > >>> problematic, but it's also orthogonal (I believe) to the issue with > >>> the "destroy" lifecycle method on a JPF. Ultimately, I think that the > >>> JDK assumed that the classes in java.beans.beancontext.* would be used > >>> to run a single hierarchy of controls on a VM. Beehive's usage > >>> pattern is a little different in that in enterprise applications, many > >>> hierarchies of beans run on a VM (one per JPF, in NetUI's case). As > >>> such, a global lock doesn't fit well. > >>> > >>> So, hopefully that provides some background on the > >>> BeanContext.globalHierarchyLock. > >>> > >>> That being said, I think that the deadlock is orthogonal to this > >>> problem as the "destroy" lifecycle method on a JPF can be invoked > >>> without waiting for a lock on the JPF instance itself. The result is > >>> that two threads can actually run through a JPF simultaneously. > >>> > >>> With Daryl's changes in February, the ControlContainerContext object > >>> for a Page Flow is a class-level field, and locking is performed on > >>> this object at: > >>> > >>> - create time > >>> - action execution time > >>> - page rendering time (JSP and Faces) > >>> > >>> Guess that the question is whether this situation is protected: > >>> > >>> Thread A is execution an action on JPF X > >>> Thread B is destroying JPF X en route to running JPF Y > >>> > >>> >From the code in PageFlowController.persistInSession (transitively > >>> down to StorageHandler.removeAttribute), it seems like a lock is held > >>> on the "current" JPF that would prevent this. Right? > >>> > >>> If that's true, then I've got a little more info to go on in > >>> investigating this problem. > >>> > >>> Thanks for any info... :) > >>> > >>> Eddie > >>> > >>> > >>> > >>> > >>> On 3/31/06, Eddie O'Neil <[EMAIL PROTECTED]> wrote: > >>> > >>> > >>>> Good questions -- let's see if I can explain the threads more > >>>> clearly in ASCII. :) > >>>> > >>>> What's happening is this (note, both threads operate on the *same* > >>>> instance of a JPF): > >>>> > >>>> Thread 1 (executing a page flow) Thread2 (destroying a page > >>>> flow) > >>>> acquire lock on FooPage Flow > >>>> acquire > >>>> BC.gHL > >>>> acquire lock on BarControlBean > >>>> wait for > >>>> lock on BarControlBean > >>>> wait for lock on BC.gHL > >>>> > >>>> Because the Controls infrastructure is based on the BeanContext > >>>> (Services|Support|etc) code in the java.beans package of the JDK, it > >>>> uses the BeanContextServicesSupport class as a base class for the > >>>> ControlBeanContext object in a Control. This JDK class (BCSS) uses > >>>> the BeanContext.globalHierarchyLock (also in the JDK) to serialize > >>>> access to shared data structures like the list of "service" objects in > >>>> a BeanContext, etc. This is a lock that is static throughout the JDK > >>>> (!). > >>>> > >>>> To answer your questions: > >>>> > >>>> > >>>> > >>>>> What grabs the lock on BeanContextServicesSupport.globalHierarchyLock? > >>>>> > >>>>> > >>>> The base classes for the ControlBeanContext object which delegates up > >>>> to it's "super" in order to serialize changes and service requests to > >>>> the classes that implement event listener support, etc. > >>>> > >>>> > >>>> > >>>>> What prevents deadlock in general between locking on > >>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean? > >>>>> > >>>>> > >>>> Good question -- this seems like the crux of the issue. In general, > >>>> this is protected by a Control container serializing access to the > >>>> container's ControlContainerContext object. In this case, the problem > >>>> occurs because two threads are trying to setup and destroy the same > >>>> context object simultaneously. If those setup and destroy calls were > >>>> serialized against the ControlContainerContext object stored in the > >>>> HttpSession, presumably, the semantics would be maintained. > >>>> > >>>> My guess is that this deadlock occurred with a double page submit or > >>>> quick refresh that caught the tail end of the previous thread and the > >>>> beginning of the next request such that the threads interleaved in > >>>> this way. > >>>> > >>>> So, hopefully that helps explain what's happening. > >>>> > >>>> Ultimately, I'm not sure how to fix this yet and would appreciate > >>>> thoughts on the topic. Would it be safe to acquire a lock on the JPF > >>>> instance during onDestroy? That would serialize the access to the > >>>> session scoped CCC object. > >>>> > >>>> Another way to go (and it's a lot of work) is to build a NetUI / > >>>> web-tier specific control container that doesn't leverage the *Support > >>>> classes in the JDK. This would allow the container's implementation > >>>> to be tailored to the single-threaded nature of the web tier and would > >>>> remove a lot of the performance implications around locking and > >>>> synchronized data structures. But, I've not thought through this yet, > >>>> either. :) > >>>> > >>>> Eddie > >>>> > >>>> > >>>> > >>>> On 3/30/06, Rich Feit <[EMAIL PROTECTED]> wrote: > >>>> > >>>> > >>>>> Looks like a tough one. First of all, the problem with synchronizing in > >>>>> onDestroy is also a deadlock issue: since onDestroy is called from an > >>>>> HttpSessionBindingListener, the Servlet container may have a lock on the > >>>>> session itself when onDestroy is called. I don't think it's mandated by > >>>>> the Servlet spec, but I don't think it's forbidden either. > >>>>> Unfortunately, this means that a deadlock can occur if code in another > >>>>> thread synchronizes on the page flow instance first (as in an action > >>>>> method invocation), then calls a method on HttpSession that synchronizes > >>>>> on the session object. Again, I don't think this is mandated by the > >>>>> spec, but it happens. > >>>>> > >>>>> Nothing strikes me off the bat, but I actually don't understand the > >>>>> deadlock sequence below (I don't disagree -- just don't have enough info > >>>>> to see it). What grabs the lock on > >>>>> BeanContextServicesSupport.globalHierarchyLock? What prevents deadlock > >>>>> in general between locking on > >>>>> BeanContextServicesSupport.globalHierarchyLock and BarControlBean? > >>>>> > >>>>> Also, how is the same JPF being *created* by two separate threads? > >>>>> > >>>>> Rich > >>>>> > >>>>> Eddie O'Neil wrote: > >>>>> > >>>>> > >>>>>> Rich/Daryl-- > >>>>>> > >>>>>> Hey, I've run into a thread deadlock problem in the interaction > >>>>>> between Controls and Page Flow that happens in the following > >>>>>> circumstance: > >>>>>> > >>>>>> thread1: acquire lock on FooPageFlow (during FlowController.execute) > >>>>>> thread2: acquire lock on > >>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class) > >>>>>> thread1: acquire lock on BarControlBean (ControlBean.ensureControl()) > >>>>>> thread2: wait for lock on BarControlBean (BeanContextSupport.remove()) > >>>>>> thread1: wait for lock on > >>>>>> BeanContextServicesSupport.globalHierarchyLock (JDK class) > >>>>>> > >>>>>> So, the problem is that the same JPF is being both created and > >>>>>> destroyed by two different threads. It appears that the "destroy" > >>>>>> phase of the JPF lifecycle is entirely unsynchronized and can freely > >>>>>> acquire Control locks without having serialized access to the Page > >>>>>> Flow itself. > >>>>>> > >>>>>> My thought is to add a synchronization point in > >>>>>> JavaControlUtils.uninitJavaControls that locks on the > >>>>>> PageFlowManagedObject as this would serialize access to the Page Flow. > >>>>>> But, I seem to recall some threading issues with locking on a JPF > >>>>>> during the "destroy" part of the lifecycle and don't want to resurrect > >>>>>> those. > >>>>>> > >>>>>> Any suggestions about how best to make this fix? > >>>>>> > >>>>>> Eddie > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>> > > > > >
