Re: More information on query execution (with algebra)

Andy Seaborne Thu, 02 Feb 2012 02:22:15 -0800

On 01/02/12 17:10, Laurent Pellegrino wrote:

Hi,


I compared two solutions regarding the creation of algebra (one graph,
with one bgp with 7 triple patterns and a filter with 9 conditions):

a) It consists in creating the algebra (by instantiating around 50
java objects) each time I receive a quadruple.

b) It uses a template and placeholders as you suggested. The template
is created once. Placeholders are represented by using Node_Var with a
dedicated name (there are 4 variables). Each time a quadruple is
received, the template is rewritten by using
NodeTransformLib#transform with a custom NodeTransform that checks the
name of each Node_Var and replaces it by the desired value is that
name is one of the placeholders.

I was thinking that the solution you (Andy) suggested b) was better
than a) because if I have understood how Transform works, several
object instantiation can be avoided. Unfortunately, after running 10^6
times both solutions (with JDK7), solution  b) is 11% slower than a).

The ways of actual Java performance are a bit to a mystery to me :-)Object creation/deletion can be fast or slow depending on what the JITworks out for scope. And for code, CPU caches play a bigger part thanI'd expect and "final" less.

If you get the chance, profiling each case would be informative - maybethere's a hot spot somewhere. It's quite easy to have an extra string"+" in some loop, or touch Locale indirectly (that's really expensive!).


        Andy


Laurent

On Tue, Jan 31, 2012 at 3:54 PM, Laurent Pellegrino
<[email protected]>  wrote:

Thanks for information and advices.

Which storage layer are you using?


Iam using TDB with Datasetgraph and transactions.

Laurent

On Tue, Jan 31, 2012 at 12:15 PM, Andy Seaborne<[email protected]>  wrote:

On 30/01/12 16:02, Laurent Pellegrino wrote:


Hi all,

Some context to understand why I am asking more information: I have an
application where each time it is called, a new SPARQL query (as
String) is created based on the template and a quadruple (Java
object). This means that interesting values from the quadruple have to
be transformed to be put inside the SPARQL template by using a node
formatter. Then, the SPARQL query has to be parsed and the result has
to be evaluated against a dataset.

I was wondering whether I can skip these parsing steps to save some
time during the execution of the application. It seems it is possible
by working at the algebra level.

If some optimizations are done on queries by Jena, are they done
before the evaluation or after parsing? I mean, when I give to
Algebra.exec(...) a query is it always optimized via a call to
Algebra#optimize?



Optimizations are done at the start of execution.

They happen at the point when QueryEngineBase calls modifyOp.

In QueryEngineBase, modifyOp just returns the op unchanged.

In QueryEngineMain, used by ARQ for general evaluation, mainly in-memory,
modifyOp is a call to Algebra.optimize

QueryEngineTDB extends QueryEngineMain.  It calls super.modifyOp and does
some additional stuff.

QueryEngineSDB inherits from QueryEngineBase so it does nothing.  It's
processing is done earlier (historical reason) in QueryEngineSDB.init and it
calls a couple of optimizations directly.  It does not want the join
optimizations.

You can replace the optimizer even down to a per-execution basis: see
ARQConstants.sysOptimizerFactory and Optimize.decideOptimizer.  Or turn off
various optimizations by symbol setting in the context. See
Optimize.rewrite.

Is there any builder to ease the construction of algebra?



One way might be to construct the algebra using placeholders (well-known
nodes), then use a Transform to change it.

I have also seen in a wiki page [1] it is possible to work at the
syntax level. Do you think it better to work at the syntax level or at
the algebra level to do what I want?



Algebra.


[1]
http://incubator.apache.org/jena/documentation/query/manipulating_sparql_using_arq.html

Kind Regards,

Laurent



Which storage layer are you using?

        Andy

Re: More information on query execution (with algebra)

Reply via email to