Re: More information on query execution (with algebra)

Laurent Pellegrino Wed, 01 Feb 2012 09:11:07 -0800

Hi,

I compared two solutions regarding the creation of algebra (one graph,
with one bgp with 7 triple patterns and a filter with 9 conditions):


a) It consists in creating the algebra (by instantiating around 50
java objects) each time I receive a quadruple.

b) It uses a template and placeholders as you suggested. The template
is created once. Placeholders are represented by using Node_Var with a
dedicated name (there are 4 variables). Each time a quadruple is
received, the template is rewritten by using
NodeTransformLib#transform with a custom NodeTransform that checks the
name of each Node_Var and replaces it by the desired value is that
name is one of the placeholders.

I was thinking that the solution you (Andy) suggested b) was better
than a) because if I have understood how Transform works, several
object instantiation can be avoided. Unfortunately, after running 10^6
times both solutions (with JDK7), solution  b) is 11% slower than a).

Laurent

On Tue, Jan 31, 2012 at 3:54 PM, Laurent Pellegrino
<[email protected]> wrote:
> Thanks for information and advices.
>
>> Which storage layer are you using?
>
> Iam using TDB with Datasetgraph and transactions.
>
> Laurent
>
> On Tue, Jan 31, 2012 at 12:15 PM, Andy Seaborne <[email protected]> wrote:
>> On 30/01/12 16:02, Laurent Pellegrino wrote:
>>>
>>> Hi all,
>>>
>>> Some context to understand why I am asking more information: I have an
>>> application where each time it is called, a new SPARQL query (as
>>> String) is created based on the template and a quadruple (Java
>>> object). This means that interesting values from the quadruple have to
>>> be transformed to be put inside the SPARQL template by using a node
>>> formatter. Then, the SPARQL query has to be parsed and the result has
>>> to be evaluated against a dataset.
>>>
>>> I was wondering whether I can skip these parsing steps to save some
>>> time during the execution of the application. It seems it is possible
>>> by working at the algebra level.
>>>
>>> If some optimizations are done on queries by Jena, are they done
>>> before the evaluation or after parsing? I mean, when I give to
>>> Algebra.exec(...) a query is it always optimized via a call to
>>> Algebra#optimize?
>>
>>
>> Optimizations are done at the start of execution.
>>
>> They happen at the point when QueryEngineBase calls modifyOp.
>>
>> In QueryEngineBase, modifyOp just returns the op unchanged.
>>
>> In QueryEngineMain, used by ARQ for general evaluation, mainly in-memory,
>> modifyOp is a call to Algebra.optimize
>>
>> QueryEngineTDB extends QueryEngineMain.  It calls super.modifyOp and does
>> some additional stuff.
>>
>> QueryEngineSDB inherits from QueryEngineBase so it does nothing.  It's
>> processing is done earlier (historical reason) in QueryEngineSDB.init and it
>> calls a couple of optimizations directly.  It does not want the join
>> optimizations.
>>
>> You can replace the optimizer even down to a per-execution basis: see
>> ARQConstants.sysOptimizerFactory and Optimize.decideOptimizer.  Or turn off
>> various optimizations by symbol setting in the context. See
>> Optimize.rewrite.
>>
>>
>>> Is there any builder to ease the construction of algebra?
>>
>>
>> One way might be to construct the algebra using placeholders (well-known
>> nodes), then use a Transform to change it.
>>
>>
>>> I have also seen in a wiki page [1] it is possible to work at the
>>> syntax level. Do you think it better to work at the syntax level or at
>>> the algebra level to do what I want?
>>
>>
>> Algebra.
>>
>>
>>>
>>> [1]
>>> http://incubator.apache.org/jena/documentation/query/manipulating_sparql_using_arq.html
>>>
>>> Kind Regards,
>>>
>>> Laurent
>>
>>
>> Which storage layer are you using?
>>
>>        Andy

Re: More information on query execution (with algebra)

Reply via email to