Hello,
There are two ways of going about "more information about the database."
1. The provider has access to the Traversal and can rewrite as they
need.
- e.g. XXXGraphStepStrategy implementations selecting the
appropriate indices.
2. The provider provides more information to TinkerPop to allow
TinkerPop to do the work.
- e.g. MatchStep (sorta) where we infer the graph statistics
from runtime performance.
There has been various discussions on this list (primarily stated by Pieter
Martin) about getting schema information to TinkerPop. However, like indices,
do we want to make that explicit given every providers differences in how such
matters are handled. Thus, its the tradeoff between does the provider do the
heavy lifting (1) or does TinkerPop (2). I think there will always be a balance
in that providers will always have to do their own XXXGraphStep implementations
where they can determine the selectivity of various indices internally. For
(2), one of the big pushes for 3.2.0 is the development of
"RuntimeTraversalStrategy" which will generalize the "MatchAlgorithm" package
(and thus, kill it) to support runtime traversal ordering for other area of
Gremlin such as OR, AND, linear reversal, etc.
Marko.
http://markorodriguez.com
On Nov 2, 2015, at 11:12 AM, Matthias Broecheler <[email protected]> wrote:
> I think this is a compelling argument, however, it has one major flaw:
> Gremlin is currently not aware of the schema or any statistics of the
> underlying graph database.
> For "simple" optimizations that's not too bad - the underlying graph database
> can simply replace the respective step in the traversal with an optimized
> step. That's what Titan does for TitanGraphStep or TitanVertexStep. Those are
> also interesting as there is quite a bit of logic you need to put in there to
> understand what you can reorder and pull into a step.
>
> However, it gets pretty complicated when you look at a thing like MatchStep
> which is very crucial for most of the arguments that make Gremlin a general
> "traversal machine". Both SPARQL->Gremlin and SQL->Gremlin rely heavily on
> MatchStep.
> Now, looking more closely at MatchStep there seems to be no way for Titan to
> instill its knowledge of the schema or statistic or indexes or anything into
> the algorithm that executes MatchStep.
> So, for Titan to get a "good" implementation of MatchStep Titan will need to
> effectively reimplement it. And, arguably, that's a big part of a query
> language (i.e. the entire declarative piece of Gremlin).
>
> So, the question then becomes: Does the argument of Gremlin being a universal
> traversal machine only hold for the imperative parts or can it be extended to
> the declarative aspects as well?
>
> On Sat, Oct 31, 2015 at 9:06 AM Marko Rodriguez <[email protected]> wrote:
> Hi,
>
> Yesterday I was HipChatting with Alex Popescu (cc:d) about "there is no need
> for a standard query language" as there is no need for a "standard
> programming language." He said something to the effect of "that is a strong
> argument, however there will then be discussions of virtual machine execution
> vs. native execution."
>
> Last night I was thinking -- "hmmm, that will be a bad argument to make." Why?
>
> Gremlin shouldn't be touted as a "virtual machine" but as a "traversal
> machine" (an execution engine). When Gremlin talks to an underlying graph
> system its talking to TinkerPop ("Blueprints") and then to the native API of
> the graph system. For systems that have TinkerPop as their native API
> (Titan/Bitsy/etc.) Gremlin is not a "virtual machine." For systems that don't
> (OrientDB/Neo4j/etc.), the cost for the indirection from going from TinkerPop
> API to the graph systems native API is trivial as its typically just object
> wrapping on the short-lived object heap (we will amortize this cost later --
> watch). Next, all graph systems maintain an "execution engine" for their
> respective query language. That is, OrientSQL, Cypher, SPARQL ultimately talk
> to their graph system's API: OrientDB Java API, Neo4j Java API, and Sesame or
> Jena, respectively. Gremlin does the same thing, it just talks to TinkerPop
> ("Blueprints") first, which then talks to those APIs. What makes Gremlin neat
> is that the execution engine and the language are not strongly coupled as its
> very easy for any graph language to compile to the Gremlin machine. So there
> is no relative cost in the language->machine translation, the cost (though
> minor -- wait for it) is in the machine->API translation. However, given the
> conceptual simplicity (and engineering) of the Gremlin machine, those costs
> are quickly subsumed. With MatchStep's runtime optimizer, traverser bulking,
> LazyBarriers, and (most importantly) provider specific compiler strategies
> (see Titan's beautiful use of these), Gremlin can be faster than the
> provider's "native query" language. In fact, some internal benchmarking I've
> done has shown that Gremlin is indeed equal or faster than the native
> language of the graph system where sometimes those speed differences are 5x
> to the the life of the universe. Thus, the cost of TinkerPopAPI->NativeAPI is
> so trivial at that point, its not worth even considering discussing the "cost
> of virtualization." I suspect that (though this is complete speculation at
> this point) that X-Language->GremlinMachine->Y-System could be faster than
> X-Language->Y-System given Gremlin's current (and future) compiler/engine
> design and evolution.
>
> Thus, Gremlin shouldn't be seen as a "virtual machine," but as a "traversal
> machine" that any one can connect to their graph system. It supports any
> graph language that compiles to it. It is an efficient/simple OLTP/OLAP
> execution engine pre-written for you.
>
> Thanks,
> Marko.
>
> http://markorodriguez.com
>
> On Oct 31, 2015, at 12:37 AM, pieter <[email protected]> wrote:
>
>> Yeah, Cypher/Sparql/OrientQL whatever does not compete with Gremlin.
>> Gremlin enables all of them.
>>
>> Cheers
>> Pieter
>>
>> On 31/10/2015 00:26, Marko Rodriguez wrote:
>>> Hello,
>>>
>>> While these ideas are not new to people on TinkerPop3, I had a nice
>>> revelation that I expressed in the following tweet series.
>>>
>>> https://twitter.com/twarko/status/660215611117535232
>>>
>>> Why is there no "standard programming language?" Different programming
>>> languages are good at different things.
>>> What makes more languages emerge and grow? A virtual machine abstraction.
>>> The JVM is the breeding ground for programming languages.
>>> Java projects can have many programming languages in them. No worries.
>>>
>>> There should be no "standard graph language?" Different graph
>>> languages are good at different things.
>>> What makes more graph languages emerge and grow? A traversal machine
>>> abstraction.
>>> The Gremlin traversal machine can be the breeding ground for traversal
>>> languages.
>>> TinkerPop projects can have many graph languages in them. No worries.
>>>
>>> Take care,
>>> Marko.
>>>
>>> http://markorodriguez.com
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Gremlin-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected]
>>> <mailto:[email protected]>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/gremlin-users/C31C88C0-DE7B-4383-94B8-8E8EEAA82A69%40gmail.com
>>> <https://groups.google.com/d/msgid/gremlin-users/C31C88C0-DE7B-4383-94B8-8E8EEAA82A69%40gmail.com?utm_medium=email&utm_source=footer>.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Gremlin-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/gremlin-users/563461AC.1070605%40gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/F7FA4CE3-CDCD-40CE-8A10-8D0DF18FD89B%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/CAEsQWxrSPnubOYWX%3DqJ2AujYvwu8TsekmbZC_bNurooKRGGG7Q%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.