Re: Order of triple patterns in Where Clause

Marco Neumann Sun, 08 Mar 2020 07:22:34 -0700

thank you for the hint Andy, but not quite what I was looking for.

I was aiming more for a type of feature I am familiar with from purely
functional programming languages like haskell, hugs, miranda etc to display
deductions and cells used during execution.


Marco

On Sun, Mar 8, 2020 at 10:42 AM Andy Seaborne <[email protected]> wrote:

>
>
> On 06/03/2020 17:40, Marco Neumann wrote:
> > is there statistical data available for the number of deductions /
> > joins performed for each SPARQL query of a QueryExecution object?
>
> If you run with "explain" you can find out but there isn't a specific
> record kept by the code.
>
> >
> > On Fri, Mar 6, 2020 at 3:16 PM Andy Seaborne <[email protected]> wrote:
> >
> >>
> >>
> >> On 05/03/2020 08:32, Kashif Rabbani wrote:
> >>> Hi Andy,
> >>>
> >>> Thanks for your response. I was wondering if there is any detailed
> >> documentation of the Jena optimization (rewriting & reordering)
> available
> >> online? If yes, can you please send me the reference?.
> >>
> >> The code mainly.
> >>
> >> The TDB stats is documented.
> >>
> >>> Also, if I create my own query plan (in algebraic form), is it possible
> >> to make Jena execute it as it is? I mean how to turn off jena’s
> >> optimization (rewriting & reordering)  and force my query plan for
> >> execution.
> >>
> >> Yes - two parts - algebra rewrites and BGP reordering.
> >>
> >> The context is a mapping of settings.
> >> there is a global context (ARQ.getContext())
> >> one per the DatasetGraph.getContext()
> >> one per query execution. QueryExecution.getContext()
> >>
> >> and it is treated hierarchically:
> >>
> >> Lookup in QueryExecution then DatasetGraph the Global.
> >>
> >> :: Algebra rewrite
> >>
> >> Some algebra rewrites have to be done - property functions, and rewrite
> >> some variables due to scoping. These aren't really "optimizations steps"
> >> but happen in that phase. There is OptimizerMinimal for those.
> >>
> >> To turn off optimizer and still do the minimum steps.
> >>
> >> context.set(ARQ.optimization, false)
> >>
> >> Either Algebra.exec(op, dsg) executes the algebra as given - that's a
> >> very low levelway of doing it.
> >>
> >> Turning the optimizer off is better because all the APIs work. eg
> >> QueryExecution.
> >>
> >> :: BGP reordering
> >>
> >> The reordering of triple patterns is separate.
> >> BGP steps are performed by a StageGenerator.
> >>
> >> To set up to use a custom StageGenerator:
> >>
> >> StageBuilder.setGenerator(ARQ.getContext(), stageGenerator) ;
> >>
> >> That's really only  call of
> >>      context.set(ARQ.stageGenerator, myStageGenerator) ;
> >>
> >> The default is StageGenratorGeneric that does ReorderFixed.
> >> It is used if there is no other setting in the context.
> >>
> >>       Andy
> >>
> >>>
> >>> Thanks again for your help.
> >>>
> >>> Regards,
> >>>
> >>> Kashif Rabbani,
> >>> Research Assistant,
> >>> Department of Computer Science,
> >>> Aalborg University, Denmark.
> >>>
> >>>> On 3 Mar 2020, at 13.43, Andy Seaborne <[email protected]> wrote:
> >>>>
> >>>> Hi Kashif,
> >>>>
> >>>> Optimization happens in two stages:
> >>>>
> >>>> 1. Rewrite of the algebra
> >>>> 2. Reordering of the BGPs
> >>>>
> >>>> BGPs can be implemented differnet ways - and they are an inferenece
> >> extnesion point in SPARQL.
> >>>>
> >>>> What you see if the first. BGPs are reordered during execution.
> >>>>
> >>>> The algorithm can be stats driven for TDB and TDB2 storage:
> >>>>    https://jena.apache.org/documentation/tdb/optimizer.html
> >>>>
> >>>> The interface is
> >> org.apache.jena.sparql.engine.optimizer.reorder.ReorderTransformation
> >>>>
> >>>> and a general purpose reordering is done for in-memory and is the
> >> default for TDB.
> >>>>
> >>>> The default reorder is "grounded triples first, leave equal weights
> >> alone". It cascades whether a term is bound by an earlier step.
> >>>>
> >>>>>      { ?a  mbz:alias           "Amy Beach" .
> >>>>>        ?b  cmno:hasInfluenced  ?a .
> >>>>>        ?c  mo:composer         ?b ;
> >>>>>            bio:date            ?d
> >>>>>      }
> >>>>
> >>>> That's actually the default order -
> >>>>
> >>>> ?a  mbz:alias           "Amy Beach" .
> >>>>
> >>>> has two bound terms so is done first.
> >>>>
> >>>> and now ?a is bound so
> >>>> ?b  cmno:hasInfluenced  ?a .
> >>>>
> >>>> etc.
> >>>>
> >>>> Given the boundedness of the pattern, and (guess) mbz:alias "Amy
> Beach"
> >> is quite selective, With stats  ? <property> ? would have to be less
> >> numerous than ? mbz:alias "Amy Beach".
> >>>>
> >>>> There's no algebra optimization for your example, only BGP reordering.
> >>>>
> >>>> qparse --print=opt shows stage 1 optimizations.
> >>>>
> >>>> Executing with "explain" shows BGP execution.
> >>>>
> >>>>      Andy
> >>>>
> >>>>
> >>>>
> >>>> On 03/03/2020 11:56, Kashif Rabbani wrote:
> >>>>> Hi awesome community,
> >>>>> I have a question,  I am working on optimizing SPARQL query plan and
> I
> >> wonder does the order of triple patterns in the where clause effects the
> >> query plan or not?
> >>>>> For example, given a following query:
> >>>>> PREFIX  bio:  <http://purl.org/vocab/bio/0.1/>
> >>>>> PREFIX  mo:   <http://purl.org/ontology/mo/>
> >>>>> PREFIX  mbz:  <http://dbtune.org/musicbrainz/resource/vocab/>
> >>>>> PREFIX  cmno: <http://purl.org/ontology/classicalmusicnav#>
> >>>>> SELECT  ?a ?b ?c
> >>>>> WHERE
> >>>>>     { ?a  mbz:alias           "Amy Beach" .
> >>>>>       ?b  cmno:hasInfluenced  ?a .
> >>>>>       ?c  mo:composer         ?b ;
> >>>>>           bio:date            ?d
> >>>>>     }
> >>>>> // Let’s generate its algebra
> >>>>> Op op = Algebra.compile(query); results into this:
> >>>>> (project (?a ?b ?c)
> >>>>>     (bgp
> >>>>>       (triple ?a <http://dbtune.org/musicbrainz/resource/vocab/alias
> >
> >> "Amy Beach")
> >>>>>       (triple ?b <
> >> http://purl.org/ontology/classicalmusicnav#hasInfluenced> ?a)
> >>>>>       (triple ?c <http://purl.org/ontology/mo/composer> ?b)
> >>>>>       (triple ?c <http://purl.org/vocab/bio/0.1/date> ?d)
> >>>>>     ))
> >>>>> The bgp in algebra follows the exact same order as specified in the
> >> where clause of the query. Very precisely, does Jena constructs the
> query
> >> plan as it is? or it will change the order at some other level?
> >>>>> I would be happy if someone can guide me about how the Jena's plan
> >> actually constructed. If I will use some statistics of the actual RDF
> graph
> >> to change the order of triple patterns in the BGP based on selectivity,
> >> would it optimize the plan somehow?
> >>>>> Many Thanks,
> >>>>> Best Regards,
> >>>>> Kashif Rabbani.
> >>>
> >>
> >
> >
>


-- 


---
Marco Neumann
KONA

Re: Order of triple patterns in Where Clause

Reply via email to