Re: Learning from a SPARQL Query Optimizer webpage.

Matthias Broecheler Wed, 24 Jun 2015 10:58:50 -0700

I liked this textbook from back in the day:
http://www.db-book.com/


Though, it contains a lot of information and only one chapter on query
optimization.

On Wed, Jun 24, 2015 at 6:27 AM Marko Rodriguez <[email protected]>
wrote:

> Hey Rob,
>
> I didn't know you wrote that page. Thanks for doing so.
>
> You don't know how hard it is to find good practical information on
> database query optimizers. Yesterday I was like "let me just go to Amazon
> and buy a 'theory and practice of query optimization'-style book." Nothing.
> Bunch of weird "Optimize your MS Access Queries in 5 Minutes!"-type books
> and then a bunch of "A relational tuple projection over the space of reals
> is a manifold isomorphic to the tensor representation of that same tuple
> within the Levinsky-Portrokov metric space"-style books.
>
> If you know of any solid books that are not too "Oracle DB2 does it like
> this…" and more general to any designed query language and data structure,
> that would be a nice tid-bit to share.
>
> Thanks again,
> Marko.
>
> http://markorodriguez.com
>
> On Jun 24, 2015, at 1:05 AM, Rob Vesse <[email protected]> wrote:
>
> > Nice
> >
> > Glad to see I occasionally write some documentation that people find
> useful
> >
> > Rob
> >
> > On 23/06/2015 21:01, "Marko Rodriguez" <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> Sorry for the SPAM. However, here are some runtimes of the new
> >> CountMatchAglorithm vs. old CountMatchAlgorithm vs. old MatchStep.
> >>
> >>      https://gist.github.com/okram/d49e1abf48fdc18f77f9
> >>
> >> Sweet,
> >> Marko.
> >>
> >> http://markorodriguez.com
> >>
> >> On Jun 23, 2015, at 1:54 PM, Marko Rodriguez <[email protected]>
> wrote:
> >>
> >>> Hello,
> >>>
> >>> And here we go:
> >>>
> >>>
> >>>
> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/s
> >>>
> rc/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/Matc
> >>> hStep.java#L557
> >>>
> >>> Filters always go first and then within those categories, sorted by
> >>> their selectivity (i.e. endCounts / startCounts).
> >>>
> >>> Marko.
> >>>
> >>> http://markorodriguez.com
> >>>
> >>> On Jun 23, 2015, at 9:37 AM, Marko Rodriguez <[email protected]>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Last night I was reading on the iPad about how databases do query
> >>>> optimization. I have never studied this area of computing so I wanted
> >>>> to get comfortable with the terminology and general techniques.
> >>>> Unfortunately, everything on the topic is a PDF on Elsevier, Springer,
> >>>> etc. And of course, thick and math'd out… Fortunately, I found this
> >>>> neat web page:
> >>>>
> >>>>
> >>>>
> https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/DeveloperGuide/SPARQL/SP
> >>>> ARQL%20Optimization
> >>>>
> >>>>
> https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/DeveloperGuide/SPARQL/I
> >>>> mplementing%20Custom%20Optimizers
> >>>>
> >>>> Here are some things I liked:
> >>>>
> >>>>    1. "Evaluate FILTER at the earliest possible point."
> >>>>            - we currently do this in CountMatchAlgorithm where all
> "where()"
> >>>> patterns are sorted to the top.
> >>>>
> >>>>
> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-cor
> >>>>
> e/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/
> >>>> MatchStep.java#L546
> >>>>    2. "Evaluate the most selective patterns first."
> >>>>            - we currently "half do this" as we can only evaluate
> patterns that
> >>>> have the specified start-label.
> >>>>            - we can make this better by preferring those that also
> have the
> >>>> specified end-label.
> >>>>
> >>>>
> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-cor
> >>>>
> e/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/
> >>>> MatchStep.java#L536 (required labels are always start-labels)
> >>>>
> >>>> On some other webpage somewhere, someone talked about the "selectivity
> >>>> of clauses" as a function of how much data goes in and how much data
> >>>> comes out. We gather this information with MatchAlgorith.recordStart()
> >>>> and MatchAlgorithm.recordEnd().
> >>>>
> >>>>
> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/
> >>>>
> src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/Ma
> >>>> tchStep.java#L543
> >>>>            - we currently only record how much data comes out in
> >>>> CountMatchAlgorithm.
> >>>>
> >>>> With both in and out counts, we have a cost function:
> >>>>
> >>>>    filter % -- where()-patterns: out / in
> >>>>            11 traversers enter, 3 traverser leave -- filter % = 0.27
> >>>>    growth % -- match()-patterns: out / in
> >>>>            4 traverser enter, 5000 traverser leave -- growth % =
> 1250.00
> >>>>
> >>>> In essence, the %'s are still just based on counts. Thus,
> >>>> CountMatchAlgorithm could have the following struct.
> >>>>
> >>>> Record {
> >>>>  Traversal<?,?> traversal
> >>>>    // the pattern
> >>>>  short dependencyCluster
> >>>>    // some patterns require the output of other patterns (a
> >>>> start-end-label dependency tree is calculated in
> >>>> CountMatchAlgorithm.initialize().)
> >>>>  char patternType
> >>>>    // w (where) and m (match)
> >>>>  float cost
> >>>>    // filter and growth %
> >>>> }
> >>>>
> >>>> Now the global sorting of these records is (in Gremlin notation for
> >>>> ease of readability):
> >>>>
> >>>>    order.by(dependencyCluster).thenBy(patternType).thenBy(cost)
> >>>>
> >>>> At the moment of execution for a particular traverser, it would have
> >>>> this global ordering. However, it would want to do (2) above and pick
> a
> >>>> pattern that it has both a start and end for -- thus, it would simply
> >>>> iterate down the list of traversals until it found one it had both
> >>>> start/end-keys for -- if none, then first one it had a start key for.
> >>>>
> >>>> Probably should have just implemented it instead of writing an email
> >>>> as its pretty trivial to do.
> >>>>
> >>>> Take care,
> >>>> Marko.
> >>>>
> >>>> http://markorodriguez.com
> >>>>
> >>>
> >>
> >
> >
> >
> >
>
>

Re: Learning from a SPARQL Query Optimizer webpage.

Reply via email to