I liked this textbook from back in the day: http://www.db-book.com/
Though, it contains a lot of information and only one chapter on query optimization. On Wed, Jun 24, 2015 at 6:27 AM Marko Rodriguez <[email protected]> wrote: > Hey Rob, > > I didn't know you wrote that page. Thanks for doing so. > > You don't know how hard it is to find good practical information on > database query optimizers. Yesterday I was like "let me just go to Amazon > and buy a 'theory and practice of query optimization'-style book." Nothing. > Bunch of weird "Optimize your MS Access Queries in 5 Minutes!"-type books > and then a bunch of "A relational tuple projection over the space of reals > is a manifold isomorphic to the tensor representation of that same tuple > within the Levinsky-Portrokov metric space"-style books. > > If you know of any solid books that are not too "Oracle DB2 does it like > this…" and more general to any designed query language and data structure, > that would be a nice tid-bit to share. > > Thanks again, > Marko. > > http://markorodriguez.com > > On Jun 24, 2015, at 1:05 AM, Rob Vesse <[email protected]> wrote: > > > Nice > > > > Glad to see I occasionally write some documentation that people find > useful > > > > Rob > > > > On 23/06/2015 21:01, "Marko Rodriguez" <[email protected]> wrote: > > > >> Hi, > >> > >> Sorry for the SPAM. However, here are some runtimes of the new > >> CountMatchAglorithm vs. old CountMatchAlgorithm vs. old MatchStep. > >> > >> https://gist.github.com/okram/d49e1abf48fdc18f77f9 > >> > >> Sweet, > >> Marko. > >> > >> http://markorodriguez.com > >> > >> On Jun 23, 2015, at 1:54 PM, Marko Rodriguez <[email protected]> > wrote: > >> > >>> Hello, > >>> > >>> And here we go: > >>> > >>> > >>> > https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/s > >>> > rc/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/Matc > >>> hStep.java#L557 > >>> > >>> Filters always go first and then within those categories, sorted by > >>> their selectivity (i.e. endCounts / startCounts). > >>> > >>> Marko. > >>> > >>> http://markorodriguez.com > >>> > >>> On Jun 23, 2015, at 9:37 AM, Marko Rodriguez <[email protected]> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> Last night I was reading on the iPad about how databases do query > >>>> optimization. I have never studied this area of computing so I wanted > >>>> to get comfortable with the terminology and general techniques. > >>>> Unfortunately, everything on the topic is a PDF on Elsevier, Springer, > >>>> etc. And of course, thick and math'd out… Fortunately, I found this > >>>> neat web page: > >>>> > >>>> > >>>> > https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/DeveloperGuide/SPARQL/SP > >>>> ARQL%20Optimization > >>>> > >>>> > https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/DeveloperGuide/SPARQL/I > >>>> mplementing%20Custom%20Optimizers > >>>> > >>>> Here are some things I liked: > >>>> > >>>> 1. "Evaluate FILTER at the earliest possible point." > >>>> - we currently do this in CountMatchAlgorithm where all > "where()" > >>>> patterns are sorted to the top. > >>>> > >>>> > https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-cor > >>>> > e/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ > >>>> MatchStep.java#L546 > >>>> 2. "Evaluate the most selective patterns first." > >>>> - we currently "half do this" as we can only evaluate > patterns that > >>>> have the specified start-label. > >>>> - we can make this better by preferring those that also > have the > >>>> specified end-label. > >>>> > >>>> > https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-cor > >>>> > e/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/ > >>>> MatchStep.java#L536 (required labels are always start-labels) > >>>> > >>>> On some other webpage somewhere, someone talked about the "selectivity > >>>> of clauses" as a function of how much data goes in and how much data > >>>> comes out. We gather this information with MatchAlgorith.recordStart() > >>>> and MatchAlgorithm.recordEnd(). > >>>> > >>>> > https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/ > >>>> > src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/Ma > >>>> tchStep.java#L543 > >>>> - we currently only record how much data comes out in > >>>> CountMatchAlgorithm. > >>>> > >>>> With both in and out counts, we have a cost function: > >>>> > >>>> filter % -- where()-patterns: out / in > >>>> 11 traversers enter, 3 traverser leave -- filter % = 0.27 > >>>> growth % -- match()-patterns: out / in > >>>> 4 traverser enter, 5000 traverser leave -- growth % = > 1250.00 > >>>> > >>>> In essence, the %'s are still just based on counts. Thus, > >>>> CountMatchAlgorithm could have the following struct. > >>>> > >>>> Record { > >>>> Traversal<?,?> traversal > >>>> // the pattern > >>>> short dependencyCluster > >>>> // some patterns require the output of other patterns (a > >>>> start-end-label dependency tree is calculated in > >>>> CountMatchAlgorithm.initialize().) > >>>> char patternType > >>>> // w (where) and m (match) > >>>> float cost > >>>> // filter and growth % > >>>> } > >>>> > >>>> Now the global sorting of these records is (in Gremlin notation for > >>>> ease of readability): > >>>> > >>>> order.by(dependencyCluster).thenBy(patternType).thenBy(cost) > >>>> > >>>> At the moment of execution for a particular traverser, it would have > >>>> this global ordering. However, it would want to do (2) above and pick > a > >>>> pattern that it has both a start and end for -- thus, it would simply > >>>> iterate down the list of traversals until it found one it had both > >>>> start/end-keys for -- if none, then first one it had a start key for. > >>>> > >>>> Probably should have just implemented it instead of writing an email > >>>> as its pretty trivial to do. > >>>> > >>>> Take care, > >>>> Marko. > >>>> > >>>> http://markorodriguez.com > >>>> > >>> > >> > > > > > > > > > >
