RE: Aggregators and concurrent use of Query object

Stephen Allen Tue, 20 Sep 2011 08:55:26 -0700

Hi Holger,

I believe you are correct that Query objects with aggregators cannot be
reused by different threads.  They *can* be reused by the same thread or by
different threads that synchronize the compile step, but even then there is
a problem with the Query object hanging onto references to a new aggregator
for each query execution.


The thing causing this appears to be in AlgebraGenerator.java line 562,
where the aggregators added to a Query object are referenced directly by the
compiled query plan.  Instead, we should make a copy of the aggregators so
that the original Query object remains immutable.

I've created a JIRA issue and submitted a patch, JENA-120:
https://issues.apache.org/jira/browse/JENA-120

As a work-around until the patch is applied, I think you can synchronize
around the QueryExecutionFactory.create() method.  Or, you can decide not to
cache Group By queries (test for this with Query.hasGroupBy()).

I don't know if there are other issues that may prevent reusing Query
objects, maybe Andy can chime in here.

-Stephen

P.S.  Your strategy of caching Query objects does avoid having to reparse
the query string, which can be quite beneficial.  Along these same lines, a
better enhancement to ARQ would be a mechanism to cache the query plans
after the optimizer step.  Query optimization itself can get quite expensive
(n! for left-deep trees, and even worse for bushy trees).



> -----Original Message-----
> From: Holger Knublauch [mailto:[email protected]]
> Sent: Tuesday, September 20, 2011 1:14 AM
> To: [email protected]
> Subject: Aggregators and concurrent use of Query object
> 
> Hi Andy,
> 
> we have (unreliably) run into exceptions like the one below, and my
> suspicion is that the ARQ Query class is not meant to be re-used by
> multiple threads. Although each step in the Query is converted into a
> corresponding Algebra objects for execution, the Aggregators seem to be
> shared between multiple objects. Is this correct and do I need to
> create a new Query each time I want a QueryExecution? This would slow
> down things quite a lot, as we currently cache all Queries that were
> created from string representation. If this is the case, are there any
> ways to tell which particular queries are not thread-safe, e.g. all
> queries involving aggregations?
> 
> If I am totally off the mark, do you know what else could cause the
> exception below, only sometimes in multi-threading conditions?
> 
> Thank you,
> Holger
> 
> 
> com.hp.hpl.jena.sparql.ARQInternalErrorException: Null for accumulator
>       at
> com.hp.hpl.jena.sparql.expr.aggregate.AggregatorBase.getValue(Aggregato
> rBase.java:61)
>       at
> com.hp.hpl.jena.sparql.engine.iterator.QueryIterGroup.calc(QueryIterGro
> up.java:121)
>       at
> com.hp.hpl.jena.sparql.engine.iterator.QueryIterGroup.<init>(QueryIterG
> roup.java:32)
>       at
> com.hp.hpl.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:4
> 13)
>       at
> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.visit(ExecutionDis
> patch.java:255)
>       at
> com.hp.hpl.jena.sparql.algebra.op.OpGroup.visit(OpGroup.java:37)
>       at
> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.exec(ExecutionDisp
> atch.java:33)
>       at
> com.hp.hpl.jena.sparql.engine.main.OpExecutor.executeOp(OpExecutor.java
> :107)
>       at
> com.hp.hpl.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:4
> 41)
>       at
> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.visit(ExecutionDis
> patch.java:241)
>       at
> com.hp.hpl.jena.sparql.algebra.op.OpExtend.visit(OpExtend.java:107)
>       at
> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.exec(ExecutionDisp
> atch.java:33)
>       at
> com.hp.hpl.jena.sparql.engine.main.OpExecutor.executeOp(OpExecutor.java
> :107)
>       at
> com.hp.hpl.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:3
> 93)
>       at
> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.visit(ExecutionDis
> patch.java:213)
>       at
> com.hp.hpl.jena.sparql.algebra.op.OpProject.visit(OpProject.java:34)
>       at
> com.hp.hpl.jena.sparql.engine.main.ExecutionDispatch.exec(ExecutionDisp
> atch.java:33)
>       at
> com.hp.hpl.jena.sparql.engine.main.OpExecutor.executeOp(OpExecutor.java
> :107)
>       at
> com.hp.hpl.jena.sparql.engine.main.OpExecutor.execute(OpExecutor.java:8
> 0)
>       at com.hp.hpl.jena.sparql.engine.main.QC.execute(QC.java:40)
>       at
> com.hp.hpl.jena.sparql.engine.main.QueryEngineMain.eval(QueryEngineMain
> .java:52)
>       at
> com.hp.hpl.jena.sparql.engine.QueryEngineBase.evaluate(QueryEngineBase.
> java:138)
>       at
> com.hp.hpl.jena.sparql.engine.QueryEngineBase.createPlan(QueryEngineBas
> e.java:109)
>       at
> com.hp.hpl.jena.sparql.engine.QueryEngineBase.getPlan(QueryEngineBase.j
> ava:97)
>       at
> com.hp.hpl.jena.sparql.engine.main.QueryEngineMain$1.create(QueryEngine
> Main.java:91)
>       at
> com.hp.hpl.jena.sparql.engine.QueryExecutionBase.getPlan(QueryExecution
> Base.java:266)
>       at
> com.hp.hpl.jena.sparql.engine.QueryExecutionBase.startQueryIterator(Que
> ryExecutionBase.java:243)
>       at
> com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execResultSet(QueryExe
> cutionBase.java:248)
>       at
> com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execSelect(QueryExecut
> ionBase.java:94)
>       at
> org.topbraid.spin.arq.SPINARQFunction.executeBody(SPINARQFunction.java:
> 121)

RE: Aggregators and concurrent use of Query object

Reply via email to