[
https://issues.apache.org/jira/browse/COCOON-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578347#action_12578347
]
Alexander Daniel commented on COCOON-1985:
------------------------------------------
Two requests can deadlock each other in Cocoon 2.1.11 (without use of parallel
with include transformer):
* request A: generating lock for 55933
* request B: generating lock for 58840
* request B: waiting for lock 55933 which is hold by request A
* request A: waiting for lock 58840 which is hold by request B
I can reproduce this behaviour with Apache Bench and following pipeline:
* terminal 1: Apache Bench request A (ab -k -n 10000 -c 25
http://localhost:8888/samples/reproduceMultipleThreads/productOfferForDevice/55933/)
* terminal 2: Apache Bench request B (ab -k -n 10000 -c 25
http://localhost:8888/samples/reproduceMultipleThreads/productOfferForDevice/58840/)
* terminal 3: touching the two data files every second to invalidate the cache
(while true; do echo -n "."; touch 55933.xml 58840.xml; sleep 1; done)
* pipeline:
<map:pipeline type="caching">
<map:match pattern="productOfferForDevice*/*/">
<map:generate src="cocoon:/exists/{2}.xml" label="a"/>
<map:transform type="xsltc" src="productOfferIncludeDevice.xsl" label="b">
<map:parameter name="noInc" value="{1}"/>
</map:transform>
<map:transform type="include" label="c"/>
<map:serialize type="xml"/>
</map:match>
<map:match pattern="exists/**">
<map:act type="resource-exists">
<map:parameter name="url" value="{1}" />
<map:generate src="{../1}" />
<map:serialize type="xml" />
</map:act>
<!-- not found -->
<map:generate src="dummy.xml" />
<map:serialize type="xml" />
</map:match>
</map:pipeline>
After some seconds the deadlock occurs ==>
* Apache Bench requests run into a timeout
* I can see following pipe locks in the default transient store:
PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/55933.xml?pipelinehash=-910770960103935149_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1
(class: org.mortbay.util.ThreadPool$PoolThread)
PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/58840.xml?pipelinehash=-4996088883111986478_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1
(class: org.mortbay.util.ThreadPool$PoolThread)
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml
(class: org.mortbay.util.ThreadPool$PoolThread)
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml
(class: org.mortbay.util.ThreadPool$PoolThread)
I added some logging to AbstractCachingProcessingPipeline.java which reconfirms
the explanations above:
INFO (2008-03-13) 13:50.16:072 [sitemap]
(/samples/reproduceMultipleThreads/productOfferForDevice/55933/)
PoolThread-47/AbstractCachingProcessingPipeline: generating lock
PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/55933.xml?pipelinehash=-910770960103935149_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1
INFO (2008-03-13) 13:50.16:074 [sitemap]
(/samples/reproduceMultipleThreads/productOfferForDevice/55933/)
PoolThread-47/AbstractCachingProcessingPipeline: generating lock
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml
INFO (2008-03-13) 13:50.16:075 [sitemap]
(/samples/reproduceMultipleThreads/productOfferForDevice/58840/)
PoolThread-6/AbstractCachingProcessingPipeline: generating lock
PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/58840.xml?pipelinehash=-4996088883111986478_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1
INFO (2008-03-13) 13:50.16:075 [sitemap]
(/samples/reproduceMultipleThreads/productOfferForDevice/58840/)
PoolThread-6/AbstractCachingProcessingPipeline: generating lock
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml
INFO (2008-03-13) 13:50.16:281 [sitemap]
(/samples/reproduceMultipleThreads/productOfferForDevice/58840/)
PoolThread-6/AbstractCachingProcessingPipeline: waiting for lock
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml
INFO (2008-03-13) 13:50.16:304 [sitemap]
(/samples/reproduceMultipleThreads/productOfferForDevice/55933/)
PoolThread-47/AbstractCachingProcessingPipeline: waiting for lock
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml
With the attached reproduceMultipleThreads.tar.gz you can reproduce the
behaviour yourself:
* download and extract Cocoon 2.1.11
* cd $CocoonHome
* ./build.sh
* cd build/webapp/samples
* tar -xzf $DownloadFolder/reproduceMultipleThreads.tar.gz
* cd ../../..
* ./cocoon.sh
* open 3 terminals and cd into
$CocoonHome/build/webapp/samples/reproduceMultipleThreads in each
* dry run without invalidating the cache to see that everything is working:
- terminal 1: ./terminal1.sh
- terminal 2: ./terminal2.sh
* run with invalidating the cache every seconds:
- terminal 1: ./terminal1.sh
- terminal 2: ./terminal2.sh
- terminal 3: ./terminal3.sh
* When Apache Bench has run into a timeout you can view the pipelocks with
http://localhost:8888/samples/reproduceMultipleThreads/pipelocks
We are currently facing this issue on our production servers.
WHAT IS THE BEST WAY TO FIX THAT?
* removing the pipeline locking code as Ellis suggested?
* making waitForLock fuzzy
* ...
If the pipelock design is the same in Cocoon 2.2 the same deadlock could occur
there.
> AbstractCachingProcessingPipeline locking with IncludeTransformer may hang
> pipeline
> -----------------------------------------------------------------------------------
>
> Key: COCOON-1985
> URL: https://issues.apache.org/jira/browse/COCOON-1985
> Project: Cocoon
> Issue Type: Bug
> Components: * Cocoon Core
> Affects Versions: 2.1.9, 2.1.10, 2.1.11, 2.2-dev (Current SVN)
> Reporter: Ellis Pritchard
> Priority: Critical
> Fix For: 2.2-dev (Current SVN)
>
> Attachments: caching-trials.patch, includer.xsl, patch.txt,
> sitemap.xmap
>
>
> Cocoon 2.1.9 introduced the concept of a lock in
> AbstractCachingProcessingPipeline, an optimization to prevent two concurrent
> requests from generating the same cached content. The first request adds the
> pipeline key to the transient cache to 'lock' the cache entry for that
> pipeline, subsequent concurrent requests wait for the first request to cache
> the content (by Object.lock()ing the pipeline key entry) before proceeding,
> and can then use the newly cached content.
> However, this has introduced an incompatibility with the IncludeTransformer:
> if the inclusions access the same yet-to-be-cached content as the root
> pipeline, the whole assembly hangs, since a lock will be made on a lock
> already held by the same thread, and which cannot be satisfied.
> e.g.
> i) Root pipeline generates using sub-pipeline cocoon:/foo.xml
> ii) the cocoon:/foo.xml sub-pipeline adds it's pipeline key to the transient
> store as a lock.
> iii) subsequently in the root pipeline, the IncludeTransformer is run.
> iv) one of the inclusions also generates with cocoon:/foo.xml, this
> sub-pipeline locks in AbstractProcessingPipeline.waitForLock() because the
> sub-pipeline key is already present.
> v) deadlock.
> I've found a (partial, see below) solution for this: instead of a plain
> Object being added to the transient store as the lock object, the
> Thread.currentThread() is added; when waitForLock() is called, if the lock
> object exists, it checks that it is not the same thread before attempting to
> lock it; if it is the same thread, then waitForLock() returns success, which
> allows generation to proceed. You loose the efficiency of generating the
> cache only once in this case, but at least it doesn't hang! With JDK1.5 this
> can be made neater by using Thread#holdsLock() instead of adding the thread
> object itself to the transient store.
> See patch file.
> However, even with this fix, parallel includes (when enabled) may still hang,
> because they pass the not-the-same-thread test, but fail because the root
> pipeline, which holds the initial lock, cannot complete (and therefore
> statisfy the lock condition for the parallel threads), before the threads
> themselves have completed, which then results in a deadlock again.
> The complete solution is probably to avoid locking if the lock is held by the
> same top-level Request, but that requires more knowledge of Cocoon's
> processing than I (currently) have!
> IMHO unless a complete solution is found to this, then this optimization
> should be removed completely, or else made optional by configuration, since
> it renders the IncludeTransformer dangerous.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.