thank you for the hint Andy, but not quite what I was looking for. I was aiming more for a type of feature I am familiar with from purely functional programming languages like haskell, hugs, miranda etc to display deductions and cells used during execution.
Marco On Sun, Mar 8, 2020 at 10:42 AM Andy Seaborne <[email protected]> wrote: > > > On 06/03/2020 17:40, Marco Neumann wrote: > > is there statistical data available for the number of deductions / > > joins performed for each SPARQL query of a QueryExecution object? > > If you run with "explain" you can find out but there isn't a specific > record kept by the code. > > > > > On Fri, Mar 6, 2020 at 3:16 PM Andy Seaborne <[email protected]> wrote: > > > >> > >> > >> On 05/03/2020 08:32, Kashif Rabbani wrote: > >>> Hi Andy, > >>> > >>> Thanks for your response. I was wondering if there is any detailed > >> documentation of the Jena optimization (rewriting & reordering) > available > >> online? If yes, can you please send me the reference?. > >> > >> The code mainly. > >> > >> The TDB stats is documented. > >> > >>> Also, if I create my own query plan (in algebraic form), is it possible > >> to make Jena execute it as it is? I mean how to turn off jena’s > >> optimization (rewriting & reordering) and force my query plan for > >> execution. > >> > >> Yes - two parts - algebra rewrites and BGP reordering. > >> > >> The context is a mapping of settings. > >> there is a global context (ARQ.getContext()) > >> one per the DatasetGraph.getContext() > >> one per query execution. QueryExecution.getContext() > >> > >> and it is treated hierarchically: > >> > >> Lookup in QueryExecution then DatasetGraph the Global. > >> > >> :: Algebra rewrite > >> > >> Some algebra rewrites have to be done - property functions, and rewrite > >> some variables due to scoping. These aren't really "optimizations steps" > >> but happen in that phase. There is OptimizerMinimal for those. > >> > >> To turn off optimizer and still do the minimum steps. > >> > >> context.set(ARQ.optimization, false) > >> > >> Either Algebra.exec(op, dsg) executes the algebra as given - that's a > >> very low levelway of doing it. > >> > >> Turning the optimizer off is better because all the APIs work. eg > >> QueryExecution. > >> > >> :: BGP reordering > >> > >> The reordering of triple patterns is separate. > >> BGP steps are performed by a StageGenerator. > >> > >> To set up to use a custom StageGenerator: > >> > >> StageBuilder.setGenerator(ARQ.getContext(), stageGenerator) ; > >> > >> That's really only call of > >> context.set(ARQ.stageGenerator, myStageGenerator) ; > >> > >> The default is StageGenratorGeneric that does ReorderFixed. > >> It is used if there is no other setting in the context. > >> > >> Andy > >> > >>> > >>> Thanks again for your help. > >>> > >>> Regards, > >>> > >>> Kashif Rabbani, > >>> Research Assistant, > >>> Department of Computer Science, > >>> Aalborg University, Denmark. > >>> > >>>> On 3 Mar 2020, at 13.43, Andy Seaborne <[email protected]> wrote: > >>>> > >>>> Hi Kashif, > >>>> > >>>> Optimization happens in two stages: > >>>> > >>>> 1. Rewrite of the algebra > >>>> 2. Reordering of the BGPs > >>>> > >>>> BGPs can be implemented differnet ways - and they are an inferenece > >> extnesion point in SPARQL. > >>>> > >>>> What you see if the first. BGPs are reordered during execution. > >>>> > >>>> The algorithm can be stats driven for TDB and TDB2 storage: > >>>> https://jena.apache.org/documentation/tdb/optimizer.html > >>>> > >>>> The interface is > >> org.apache.jena.sparql.engine.optimizer.reorder.ReorderTransformation > >>>> > >>>> and a general purpose reordering is done for in-memory and is the > >> default for TDB. > >>>> > >>>> The default reorder is "grounded triples first, leave equal weights > >> alone". It cascades whether a term is bound by an earlier step. > >>>> > >>>>> { ?a mbz:alias "Amy Beach" . > >>>>> ?b cmno:hasInfluenced ?a . > >>>>> ?c mo:composer ?b ; > >>>>> bio:date ?d > >>>>> } > >>>> > >>>> That's actually the default order - > >>>> > >>>> ?a mbz:alias "Amy Beach" . > >>>> > >>>> has two bound terms so is done first. > >>>> > >>>> and now ?a is bound so > >>>> ?b cmno:hasInfluenced ?a . > >>>> > >>>> etc. > >>>> > >>>> Given the boundedness of the pattern, and (guess) mbz:alias "Amy > Beach" > >> is quite selective, With stats ? <property> ? would have to be less > >> numerous than ? mbz:alias "Amy Beach". > >>>> > >>>> There's no algebra optimization for your example, only BGP reordering. > >>>> > >>>> qparse --print=opt shows stage 1 optimizations. > >>>> > >>>> Executing with "explain" shows BGP execution. > >>>> > >>>> Andy > >>>> > >>>> > >>>> > >>>> On 03/03/2020 11:56, Kashif Rabbani wrote: > >>>>> Hi awesome community, > >>>>> I have a question, I am working on optimizing SPARQL query plan and > I > >> wonder does the order of triple patterns in the where clause effects the > >> query plan or not? > >>>>> For example, given a following query: > >>>>> PREFIX bio: <http://purl.org/vocab/bio/0.1/> > >>>>> PREFIX mo: <http://purl.org/ontology/mo/> > >>>>> PREFIX mbz: <http://dbtune.org/musicbrainz/resource/vocab/> > >>>>> PREFIX cmno: <http://purl.org/ontology/classicalmusicnav#> > >>>>> SELECT ?a ?b ?c > >>>>> WHERE > >>>>> { ?a mbz:alias "Amy Beach" . > >>>>> ?b cmno:hasInfluenced ?a . > >>>>> ?c mo:composer ?b ; > >>>>> bio:date ?d > >>>>> } > >>>>> // Let’s generate its algebra > >>>>> Op op = Algebra.compile(query); results into this: > >>>>> (project (?a ?b ?c) > >>>>> (bgp > >>>>> (triple ?a <http://dbtune.org/musicbrainz/resource/vocab/alias > > > >> "Amy Beach") > >>>>> (triple ?b < > >> http://purl.org/ontology/classicalmusicnav#hasInfluenced> ?a) > >>>>> (triple ?c <http://purl.org/ontology/mo/composer> ?b) > >>>>> (triple ?c <http://purl.org/vocab/bio/0.1/date> ?d) > >>>>> )) > >>>>> The bgp in algebra follows the exact same order as specified in the > >> where clause of the query. Very precisely, does Jena constructs the > query > >> plan as it is? or it will change the order at some other level? > >>>>> I would be happy if someone can guide me about how the Jena's plan > >> actually constructed. If I will use some statistics of the actual RDF > graph > >> to change the order of triple patterns in the BGP based on selectivity, > >> would it optimize the plan somehow? > >>>>> Many Thanks, > >>>>> Best Regards, > >>>>> Kashif Rabbani. > >>> > >> > > > > > -- --- Marco Neumann KONA
