Hi Matthias,

Thanks for the link. Book ordered! (…and will collect dust in my house for many 
years to come).

Marko.

http://markorodriguez.com

On Jun 24, 2015, at 11:57 AM, Matthias Broecheler <[email protected]> wrote:

> I liked this textbook from back in the day:
> http://www.db-book.com/
> 
> Though, it contains a lot of information and only one chapter on query
> optimization.
> 
> On Wed, Jun 24, 2015 at 6:27 AM Marko Rodriguez <[email protected]>
> wrote:
> 
>> Hey Rob,
>> 
>> I didn't know you wrote that page. Thanks for doing so.
>> 
>> You don't know how hard it is to find good practical information on
>> database query optimizers. Yesterday I was like "let me just go to Amazon
>> and buy a 'theory and practice of query optimization'-style book." Nothing.
>> Bunch of weird "Optimize your MS Access Queries in 5 Minutes!"-type books
>> and then a bunch of "A relational tuple projection over the space of reals
>> is a manifold isomorphic to the tensor representation of that same tuple
>> within the Levinsky-Portrokov metric space"-style books.
>> 
>> If you know of any solid books that are not too "Oracle DB2 does it like
>> this…" and more general to any designed query language and data structure,
>> that would be a nice tid-bit to share.
>> 
>> Thanks again,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> On Jun 24, 2015, at 1:05 AM, Rob Vesse <[email protected]> wrote:
>> 
>>> Nice
>>> 
>>> Glad to see I occasionally write some documentation that people find
>> useful
>>> 
>>> Rob
>>> 
>>> On 23/06/2015 21:01, "Marko Rodriguez" <[email protected]> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Sorry for the SPAM. However, here are some runtimes of the new
>>>> CountMatchAglorithm vs. old CountMatchAlgorithm vs. old MatchStep.
>>>> 
>>>>     https://gist.github.com/okram/d49e1abf48fdc18f77f9
>>>> 
>>>> Sweet,
>>>> Marko.
>>>> 
>>>> http://markorodriguez.com
>>>> 
>>>> On Jun 23, 2015, at 1:54 PM, Marko Rodriguez <[email protected]>
>> wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> And here we go:
>>>>> 
>>>>> 
>>>>> 
>> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/s
>>>>> 
>> rc/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/Matc
>>>>> hStep.java#L557
>>>>> 
>>>>> Filters always go first and then within those categories, sorted by
>>>>> their selectivity (i.e. endCounts / startCounts).
>>>>> 
>>>>> Marko.
>>>>> 
>>>>> http://markorodriguez.com
>>>>> 
>>>>> On Jun 23, 2015, at 9:37 AM, Marko Rodriguez <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Last night I was reading on the iPad about how databases do query
>>>>>> optimization. I have never studied this area of computing so I wanted
>>>>>> to get comfortable with the terminology and general techniques.
>>>>>> Unfortunately, everything on the topic is a PDF on Elsevier, Springer,
>>>>>> etc. And of course, thick and math'd out… Fortunately, I found this
>>>>>> neat web page:
>>>>>> 
>>>>>> 
>>>>>> 
>> https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/DeveloperGuide/SPARQL/SP
>>>>>> ARQL%20Optimization
>>>>>> 
>>>>>> 
>> https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/DeveloperGuide/SPARQL/I
>>>>>> mplementing%20Custom%20Optimizers
>>>>>> 
>>>>>> Here are some things I liked:
>>>>>> 
>>>>>>   1. "Evaluate FILTER at the earliest possible point."
>>>>>>           - we currently do this in CountMatchAlgorithm where all
>> "where()"
>>>>>> patterns are sorted to the top.
>>>>>> 
>>>>>> 
>> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-cor
>>>>>> 
>> e/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/
>>>>>> MatchStep.java#L546
>>>>>>   2. "Evaluate the most selective patterns first."
>>>>>>           - we currently "half do this" as we can only evaluate
>> patterns that
>>>>>> have the specified start-label.
>>>>>>           - we can make this better by preferring those that also
>> have the
>>>>>> specified end-label.
>>>>>> 
>>>>>> 
>> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-cor
>>>>>> 
>> e/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/
>>>>>> MatchStep.java#L536 (required labels are always start-labels)
>>>>>> 
>>>>>> On some other webpage somewhere, someone talked about the "selectivity
>>>>>> of clauses" as a function of how much data goes in and how much data
>>>>>> comes out. We gather this information with MatchAlgorith.recordStart()
>>>>>> and MatchAlgorithm.recordEnd().
>>>>>> 
>>>>>> 
>> https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/
>>>>>> 
>> src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/map/Ma
>>>>>> tchStep.java#L543
>>>>>>           - we currently only record how much data comes out in
>>>>>> CountMatchAlgorithm.
>>>>>> 
>>>>>> With both in and out counts, we have a cost function:
>>>>>> 
>>>>>>   filter % -- where()-patterns: out / in
>>>>>>           11 traversers enter, 3 traverser leave -- filter % = 0.27
>>>>>>   growth % -- match()-patterns: out / in
>>>>>>           4 traverser enter, 5000 traverser leave -- growth % =
>> 1250.00
>>>>>> 
>>>>>> In essence, the %'s are still just based on counts. Thus,
>>>>>> CountMatchAlgorithm could have the following struct.
>>>>>> 
>>>>>> Record {
>>>>>> Traversal<?,?> traversal
>>>>>>   // the pattern
>>>>>> short dependencyCluster
>>>>>>   // some patterns require the output of other patterns (a
>>>>>> start-end-label dependency tree is calculated in
>>>>>> CountMatchAlgorithm.initialize().)
>>>>>> char patternType
>>>>>>   // w (where) and m (match)
>>>>>> float cost
>>>>>>   // filter and growth %
>>>>>> }
>>>>>> 
>>>>>> Now the global sorting of these records is (in Gremlin notation for
>>>>>> ease of readability):
>>>>>> 
>>>>>>   order.by(dependencyCluster).thenBy(patternType).thenBy(cost)
>>>>>> 
>>>>>> At the moment of execution for a particular traverser, it would have
>>>>>> this global ordering. However, it would want to do (2) above and pick
>> a
>>>>>> pattern that it has both a start and end for -- thus, it would simply
>>>>>> iterate down the list of traversals until it found one it had both
>>>>>> start/end-keys for -- if none, then first one it had a start key for.
>>>>>> 
>>>>>> Probably should have just implemented it instead of writing an email
>>>>>> as its pretty trivial to do.
>>>>>> 
>>>>>> Take care,
>>>>>> Marko.
>>>>>> 
>>>>>> http://markorodriguez.com
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 

Reply via email to