Hi Matthias, Thanks for the link. Book ordered! (…and will collect dust in my house for many years to come).
Marko. http://markorodriguez.com On Jun 24, 2015, at 11:57 AM, Matthias Broecheler <[email protected]> wrote: > I liked this textbook from back in the day: > http://www.db-book.com/ > > Though, it contains a lot of information and only one chapter on query > optimization. > > On Wed, Jun 24, 2015 at 6:27 AM Marko Rodriguez <[email protected]> > wrote: > >> Hey Rob, >> >> I didn't know you wrote that page. Thanks for doing so. >> >> You don't know how hard it is to find good practical information on >> database query optimizers. Yesterday I was like "let me just go to Amazon >> and buy a 'theory and practice of query optimization'-style book." Nothing. >> Bunch of weird "Optimize your MS Access Queries in 5 Minutes!"-type books >> and then a bunch of "A relational tuple projection over the space of reals >> is a manifold isomorphic to the tensor representation of that same tuple >> within the Levinsky-Portrokov metric space"-style books. >> >> If you know of any solid books that are not too "Oracle DB2 does it like >> this…" and more general to any designed query language and data structure, >> that would be a nice tid-bit to share. >> >> Thanks again, >> Marko. >> >> http://markorodriguez.com >> >> On Jun 24, 2015, at 1:05 AM, Rob Vesse <[email protected]> wrote: >> >>> Nice >>> >>> Glad to see I occasionally write some documentation that people find >> useful >>> >>> Rob >>> >>> On 23/06/2015 21:01, "Marko Rodriguez" <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> Sorry for the SPAM. However, here are some runtimes of the new >>>> CountMatchAglorithm vs. old CountMatchAlgorithm vs. old MatchStep. >>>> >>>> https://gist.github.com/okram/d49e1abf48fdc18f77f9 >>>> >>>> Sweet, >>>> Marko. >>>> >>>> http://markorodriguez.com >>>> >>>> On Jun 23, 2015, at 1:54 PM, Marko Rodriguez <[email protected]> >> wrote: >>>> >>>>> Hello, >>>>> >>>>> And here we go: >>>>> >>>>> >>>>> >> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/s >>>>> >> rc/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/Matc >>>>> hStep.java#L557 >>>>> >>>>> Filters always go first and then within those categories, sorted by >>>>> their selectivity (i.e. endCounts / startCounts). >>>>> >>>>> Marko. >>>>> >>>>> http://markorodriguez.com >>>>> >>>>> On Jun 23, 2015, at 9:37 AM, Marko Rodriguez <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Last night I was reading on the iPad about how databases do query >>>>>> optimization. I have never studied this area of computing so I wanted >>>>>> to get comfortable with the terminology and general techniques. >>>>>> Unfortunately, everything on the topic is a PDF on Elsevier, Springer, >>>>>> etc. And of course, thick and math'd out… Fortunately, I found this >>>>>> neat web page: >>>>>> >>>>>> >>>>>> >> https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/DeveloperGuide/SPARQL/SP >>>>>> ARQL%20Optimization >>>>>> >>>>>> >> https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/DeveloperGuide/SPARQL/I >>>>>> mplementing%20Custom%20Optimizers >>>>>> >>>>>> Here are some things I liked: >>>>>> >>>>>> 1. "Evaluate FILTER at the earliest possible point." >>>>>> - we currently do this in CountMatchAlgorithm where all >> "where()" >>>>>> patterns are sorted to the top. >>>>>> >>>>>> >> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-cor >>>>>> >> e/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ >>>>>> MatchStep.java#L546 >>>>>> 2. "Evaluate the most selective patterns first." >>>>>> - we currently "half do this" as we can only evaluate >> patterns that >>>>>> have the specified start-label. >>>>>> - we can make this better by preferring those that also >> have the >>>>>> specified end-label. >>>>>> >>>>>> >> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-cor >>>>>> >> e/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ >>>>>> MatchStep.java#L536 (required labels are always start-labels) >>>>>> >>>>>> On some other webpage somewhere, someone talked about the "selectivity >>>>>> of clauses" as a function of how much data goes in and how much data >>>>>> comes out. We gather this information with MatchAlgorith.recordStart() >>>>>> and MatchAlgorithm.recordEnd(). >>>>>> >>>>>> >> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/ >>>>>> >> src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/Ma >>>>>> tchStep.java#L543 >>>>>> - we currently only record how much data comes out in >>>>>> CountMatchAlgorithm. >>>>>> >>>>>> With both in and out counts, we have a cost function: >>>>>> >>>>>> filter % -- where()-patterns: out / in >>>>>> 11 traversers enter, 3 traverser leave -- filter % = 0.27 >>>>>> growth % -- match()-patterns: out / in >>>>>> 4 traverser enter, 5000 traverser leave -- growth % = >> 1250.00 >>>>>> >>>>>> In essence, the %'s are still just based on counts. Thus, >>>>>> CountMatchAlgorithm could have the following struct. >>>>>> >>>>>> Record { >>>>>> Traversal<?,?> traversal >>>>>> // the pattern >>>>>> short dependencyCluster >>>>>> // some patterns require the output of other patterns (a >>>>>> start-end-label dependency tree is calculated in >>>>>> CountMatchAlgorithm.initialize().) >>>>>> char patternType >>>>>> // w (where) and m (match) >>>>>> float cost >>>>>> // filter and growth % >>>>>> } >>>>>> >>>>>> Now the global sorting of these records is (in Gremlin notation for >>>>>> ease of readability): >>>>>> >>>>>> order.by(dependencyCluster).thenBy(patternType).thenBy(cost) >>>>>> >>>>>> At the moment of execution for a particular traverser, it would have >>>>>> this global ordering. However, it would want to do (2) above and pick >> a >>>>>> pattern that it has both a start and end for -- thus, it would simply >>>>>> iterate down the list of traversals until it found one it had both >>>>>> start/end-keys for -- if none, then first one it had a start key for. >>>>>> >>>>>> Probably should have just implemented it instead of writing an email >>>>>> as its pretty trivial to do. >>>>>> >>>>>> Take care, >>>>>> Marko. >>>>>> >>>>>> http://markorodriguez.com >>>>>> >>>>> >>>> >>> >>> >>> >>> >> >>
