[ 
https://issues.apache.org/jira/browse/COCOON-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569720#action_12569720
 ] 

Ellis Pritchard commented on COCOON-1985:
-----------------------------------------

I'm still not happy with this.

1/ The original patch for 2.1.x had a problem (as I explained) where parallel 
includes can dead-lock; this is neatly by-passed in 2.2.x, in a way that is not 
possible in 2.1.x (see http://svn.apache.org/viewvc?view=rev&revision=601871). 
However...

2/ I'm convinced that this code could still dead-lock; the waiter relies on 
being woken by lock.notifyAll(), however, this notification could conceivably 
happen between getting the lock object from the transient store, and actually 
doing a lock.wait() to wait to be notified, thus the waiter would not be woken 
by the thread that holds the lock:

    protected boolean waitForLock(Object key) {
        if(transientStore != null) {
            Object lock = null;
            synchronized(transientStore) {
                String lockKey = PIPELOCK_PREFIX+key;
                if(transientStore.containsKey(lockKey)) {
                    // cache content is currently being generated, wait for 
other thread
                    lock = transientStore.get(lockKey);
                }
            }
            /***
             *** BIG SCARY UN-SYNCHRONIZED GAP
             *** lock.notifyAll() could be called here, and we'd not hear about 
it.
             ***/
            // Avoid deadlock with self (see JIRA COCOON-1985).
            if(lock != null && lock != Thread.currentThread()) {
                try {
                    // become owner of monitor
                    synchronized(lock) {
                        /*** WAIT UNTIL OTHER THREAD CALLS notifyAll() again  
***/
                        lock.wait();
                    }
                } catch (InterruptedException e) {
                    if(getLogger().isDebugEnabled()) {
                        getLogger().debug("Got interrupted waiting for other 
pipeline to finish processing, retrying...",e);
                    }
                    return false;
                }
                if(getLogger().isDebugEnabled()) {
                    getLogger().debug("Other pipeline finished processing, 
retrying to get cached response.");
                }
                return false;
            }
        }
        return true;
    }


Fortunately, because I used the Thread object as the lock object in my patch 
for 2.1.x, the unfortunate waiting thread should only (!) have to wait until 
the processor thread that set the lock is re-used for generating cacheable 
content, (and runs notifyAll() in releaseLock()), but that could still be a 
significant period. However, with the new 2.2 patch, the RequestAttributes 
object is never going to be re-used, so will never be notified, so we are in a 
complete dead-lock again.

A work-around (i.e. hack) for this is, instead of using Object#wait(), use 
Object#wait(long timeout), i.e. replace

                        lock.wait();
above, with
                        lock.wait(1000);

i.e. wait for notification, or 1000ms (pick a number); the calling code loops 
calling waitForLock() so exiting without being notified is safe, however, it's 
still a hack.

Basically, the whole idea behind pipe-line locking is a great one, but this 
implementation is full of danger, and, worse, the sort of danger you only 
encounter on a very busy system. My original recommendation was to remove this 
code; I still stand by that.



> AbstractCachingProcessingPipeline locking with IncludeTransformer may hang 
> pipeline
> -----------------------------------------------------------------------------------
>
>                 Key: COCOON-1985
>                 URL: https://issues.apache.org/jira/browse/COCOON-1985
>             Project: Cocoon
>          Issue Type: Bug
>          Components: * Cocoon Core
>    Affects Versions: 2.1.9, 2.1.10, 2.1.11, 2.2-dev (Current SVN)
>            Reporter: Ellis Pritchard
>            Priority: Critical
>             Fix For: 2.2-dev (Current SVN)
>
>         Attachments: caching-trials.patch, includer.xsl, patch.txt, 
> sitemap.xmap
>
>
> Cocoon 2.1.9 introduced the concept of a lock in 
> AbstractCachingProcessingPipeline, an optimization to prevent two concurrent 
> requests from generating the same cached content. The first request adds the 
> pipeline key to the transient cache to 'lock' the cache entry for that 
> pipeline, subsequent concurrent requests wait for the first request to cache 
> the content (by Object.lock()ing the pipeline key entry) before proceeding, 
> and can then use the newly cached content.
> However, this has introduced an incompatibility with the IncludeTransformer: 
> if the inclusions access the same yet-to-be-cached content as the root 
> pipeline, the whole assembly hangs, since a lock will be made on a lock 
> already held by the same thread, and which cannot be satisfied.
> e.g.
> i) Root pipeline generates using sub-pipeline cocoon:/foo.xml
> ii) the cocoon:/foo.xml sub-pipeline adds it's pipeline key to the transient 
> store as a lock.
> iii) subsequently in the root pipeline, the IncludeTransformer is run.
> iv) one of the inclusions also generates with cocoon:/foo.xml, this 
> sub-pipeline locks in AbstractProcessingPipeline.waitForLock() because the 
> sub-pipeline key is already present.
> v) deadlock.
> I've found a (partial, see below) solution for this: instead of a plain 
> Object being added to the transient store as the lock object, the 
> Thread.currentThread() is added; when waitForLock() is called, if the lock 
> object exists, it checks that it is not the same thread before attempting to 
> lock it; if it is the same thread, then waitForLock() returns success, which 
> allows generation to proceed. You loose the efficiency of generating the 
> cache only once in this case, but at least it doesn't hang! With JDK1.5 this 
> can be made neater by using Thread#holdsLock() instead of adding the thread 
> object itself to the transient store.
> See patch file.
> However, even with this fix, parallel includes (when enabled) may still hang, 
> because they pass the not-the-same-thread test, but fail because the root 
> pipeline, which holds the initial lock, cannot complete (and therefore 
> statisfy the lock condition for the parallel threads), before the threads 
> themselves have completed, which then results in a deadlock again.
> The complete solution is probably to avoid locking if the lock is held by the 
> same top-level Request, but that requires more knowledge of Cocoon's 
> processing than I (currently) have!
> IMHO unless a complete solution is found to this, then this optimization 
> should be removed completely, or else made optional by configuration, since 
> it renders the IncludeTransformer dangerous.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to