[
https://issues.apache.org/jira/browse/JENA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271403#comment-13271403
]
Stephen Owens commented on JENA-244:
------------------------------------
Versions involved:
==============
jena-arq-2.9.0-incubating.jar
jena-core-2.7.0-incubating.jar
We discovered a Jena deadlock during some long running stability tests. Jena is
being used in a service accessed by a web application so there were multiple
web threads accessing the same underlying model. The model was protected by the
standard enterCriticalSection calls, set for read because we were just
querying. This worked perfectly until we switched to an RDFS model. The RDFS
model exhibited occasional deadlocks.
The deadlock appears to be an issue with out of order lock acquisition. The
first thread acquired a lock on LPTopGoalIterator and then tried to get a lock
LPBRuleEngine, the second thread had a lock on LPBRuleEngine and tried to get a
lock on LPTopGoalIterator with the expected result that nobody is going
anywhere from that point forward. I traced through the relevant Jena code and
the issue seems to be that one of the two queries is the first query against
that model and so is triggering the inferencing model to add rules to the
model. In that first query the following sequence happens:
- Because it is the first query against an inference model the
FBRuleInfGraph.prepare method triggers the addition of inferences to the model
which turns the read only execution of the query into an update of the model.
- The update uses the LPBRuleEngine.addRule method which is synchronized thus
acquiring a lock on LPBRuleEngine.
- Before trying to update it calls LPBRuleEngine.checkSafeToUpdate to see if
there are any outstanding queries.
- Surprisingly, at least to me in reviewing the code, checkSafeToUpdate tries
to close any statement iterators it finds open. I don't see how this could be a
safe thing to do in a multi-threaded environment but I don't understand the
code well enough to be sure.
- That close operation calls LPTopGoalIterator.close which is a synchronized
method and waits at that point until it can acquire a lock on LPTopGoalIterator.
Meanwhile a separate thread executing a query is doing this:
- Using LPTopGoalIterator.moveForward to move through its goals.
- Since this is a synchronized method it acquires a lock on LPTopGoalIterator
- That method synchronizes on its LPBRuleEngine and waits until it can acquire
the lock.
And at this point we're in a deadlock.
In terms of a work around I can probably call prepare before querying the model
the first time. That will likely work as long as I'm not planning to write to
the model after the initial prepare since any write would invalidate the
preparation and leave it prone to deadlock again. In my particular case the
information is likely static so that might work.
The longer term fix will require a change to the lock acquisition strategy. the
LPTopGoalIterator.hasNext method could synchronize on the engine before calling
moveForward, moveForward could become not synchronized and internally
synchronize on LPBRuleEngine first and only after it acquires that synchronize
on itself. Alternately the checkSafeToUpdate could stop trying to close
external iterators and either throw an exception or wait and retry if it finds
any open iterators. In the case I'm seeing I doubt the retry strategy would be
viable since neither thread yet have a valid model.
I suspect that the right solution may be much further up the stack. Maybe the
model preparation should be done higher before it gets so deep into the SPARQL
execution phase. That might allow for a cleaner locking strategy that doesn't
allow multiple SPARQL evaluations to start until the model is stable. Maybe all
the way up at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct()?
At this point I'm just guessing because I don't know that section of the code
well enough. I'm not submitting a patch because I'd like feedback from someone
that knows this code well on their suggested approach.
Here's the trace of a deadlock:
at
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.moveForward(LPTopGoalIterator.java:83)
- waiting to lock <0x0000000779070940> (a
com.hp.hpl.jena.reasoner.rulesys.impl.LPBRuleEngine)
- locked <0x0000000776153ff0> (a
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator)
at
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.hasNext(LPTopGoalIterator.java:196)
at
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76)
at
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76)
at
com.hp.hpl.jena.util.iterator.UniqueExtendedIterator.hasNext(UniqueExtendedIterator.java:78)
at
com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:76)
at
com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:55)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:151)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:79)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:64)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:60)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:254)
at
com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:223)
"http-8080-9":
at
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.close(LPTopGoalIterator.java:169)
- waiting to lock <0x0000000776153ff0> (a
com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator)
at
com.hp.hpl.jena.reasoner.rulesys.impl.LPBRuleEngine.checkSafeToUpdate(LPBRuleEngine.java:235)
at
com.hp.hpl.jena.reasoner.rulesys.impl.LPBRuleEngine.addRule(LPBRuleEngine.java:129)
- locked <0x0000000779070940> (a
com.hp.hpl.jena.reasoner.rulesys.impl.LPBRuleEngine)
at
com.hp.hpl.jena.reasoner.rulesys.FBRuleInfGraph.addBRules(FBRuleInfGraph.java:290)
at
com.hp.hpl.jena.reasoner.rulesys.FBRuleInfGraph.preloadDeductions(FBRuleInfGraph.java:903)
at
com.hp.hpl.jena.reasoner.rulesys.FBRuleInfGraph.prepare(FBRuleInfGraph.java:483)
- locked <0x0000000778e81270> (a
com.hp.hpl.jena.reasoner.rulesys.RDFSRuleInfGraph)
at
com.hp.hpl.jena.reasoner.rulesys.FBRuleInfGraph.findWithContinuation(FBRuleInfGraph.java:574)
at
com.hp.hpl.jena.reasoner.rulesys.FBRuleInfGraph.graphBaseFind(FBRuleInfGraph.java:606)
at com.hp.hpl.jena.graph.impl.GraphBase.find(GraphBase.java:285)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.<init>(QueryIterTriplePattern.java:80)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterTriplePattern.nextStage(QueryIterTriplePattern.java:53)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.makeNextStage(QueryIterRepeatApply.java:113)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:65)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.makeNextStage(QueryIterRepeatApply.java:106)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:65)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:64)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.makeNextStage(QueryIterRepeatApply.java:106)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:65)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.makeNextStage(QueryIterRepeatApply.java:106)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:65)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at
com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:108)
at
com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:254)
at
com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:223)
> Deadlock during SPARQL execution on an inference model
> ------------------------------------------------------
>
> Key: JENA-244
> URL: https://issues.apache.org/jira/browse/JENA-244
> Project: Apache Jena
> Issue Type: Bug
> Components: Jena
> Reporter: Stephen Owens
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira