[jira] [Updated] (TINKERPOP3-813) [Proposal] Make the Gremlin Graph Traversal Machine and Instruction Set Explicit

Marko A. Rodriguez (JIRA) Thu, 27 Aug 2015 10:38:34 -0700

     [ 
https://issues.apache.org/jira/browse/TINKERPOP3-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marko A. Rodriguez updated TINKERPOP3-813:
------------------------------------------
    Description: 
There is a lot of confusion as to "what is Gremlin?" Is it Groovy? Whats 
Gremlin-Scala? Does it only work for the JVM? Is it a language or is it an 
execution engine? Is Gremlin OLAP different from Gremlin OLTP? How is Cypher 
related to Gremlin? Who is John Galt?

For TinkerPop 3.2.0, we need to make explicit *The Gremlin Graph Traversal 
Machine and Instruction Set*. Here is a direct mapping to the Java world so 
people can get a grasp of where we are going with this.

{code}
Gremlin-Java8 == Java programming language
Gremlin-Scala == Scala programming language
Gremlin-Groovy == Groovy programming language
...
Gremlin steps == Java instruction set
Gremlin machine == Java virtual machine
{code}

Lets go through each one of these to flush out the concepts.

*Gremlin as a Language.*

When you see:

{code}
g.V().as("a").out("knows").as("b").
  select("a","b").
    by("name").
    by("age")
{code}

You are bearing witness to Gremlin-Java8. This is a fluent-style graph 
traversal API that ultimately writes out Gremlin steps. The Gremlin language 
(i.e. the fluent-style for doing graph traversing) can be embedded in any host 
language that supports function composition and function nesting. Because of 
this simple requirement, there exists Gremlin-Groovy, Gremlin-Scala, 
Gremlin-Clojure, etc. However, if the host language is NOT on the JVM, then it 
doesn't compile directly to Gremlin steps. How do we support other languages 
outside the JVM? There are two ways currently, via GremlinServer, but also, via 
the step-syntax described next.

*Gremlin as an Instruction Set.*

{code}
g.V().as("a").out("knows").as("b").
  select("a","b").
    by("name").
    by("age")
{code}

The above Gremlin-Java8 query gets compiled down to a step sequence called a 
"traversal." Here is the *unique* string representation of the query above.

{code}
[GraphStep([],vertex)@[a], VertexStep(OUT,[knows],vertex)@[b], SelectStep([a, 
b],[value(name), value(age)])]
{code}

The steps are the primitives of the Gremlin graph traversal machine. They are 
the parameterized instructions that the machine ultimately executes. The 
Gremlin instruction set is approximately 30 steps. These are sufficient to 
provide general purpose computing and what is typically required to express the 
common motifs of any user query. Moreover, with Gremlin3 there is no need for 
lambdas so the step syntax above is sufficient to completely express the 
traversal computation.

Note that ANY one can create a "Gremlin program" (traversal). You can imagine a 
compiler that takes SPARQL and writes out Gremlin steps. Neo4j's Cypher could 
be compiled to write out Gremlin steps. Realize that SPARQL is NOT a JVM 
language. Thus, PHP/Python/Go/Rust/Lisp can all have Gremlin "languages" that 
simply write out traversal steps in the step-syntax presented previously. Why 
would anyone want to do this? -- because of the Gremlin graph traversal machine.

*Gremlin as a Virtual Machine.*

The Gremlin graph traversal machine can be executed in OLTP or OLAP. What does 
that mean? It means Gremlin can be executed single machine or distributed 
across a multi-machine compute cluster. Best of all, NOTHING changes about the 
language or the instruction set. That is the promise of the Gremlin machine. 
Most importantly, the Gremlin machine is agnostic to the underlying graph 
system. It works over Titan, Neo4j, OrientDB, Giraph, Hadoop, Spark, 
BlazeGraph, InfiniteGraph, SQLg, Bitsy, IBM Graph Store, AmazonDynamoDB, 
ArangoDB, etc. The Gremlin machine simply executes the steps and a result set 
is returned. It doesn't care if the steps were created from Gremlin-Java8, 
Cypher, SPARQL, MyGraphLanguage, R:Statistics, NetLogo, punch cards, etc. It 
works at the step level, not the human-writable level. This is the power of 
Gremlin. Gremlin is to the graph world as Java is the programming world. And 
best of all, Gremlin is owned by the Apache Software Foundation and thus, free 
to use for any reason commercial or otherwise.

----

So, whats next?

  1. Re-read http://arxiv.org/abs/1508.03843.
  2. Hone the Gremlin step library for TinkerPop 3.2.0
  3. Write a blog explaining this concept for developers to understand.
  4. Write a SPARQL(--) (toy) compiler that generates Gremlin steps for for the 
blog in 3.

Hopefully that answers the question -- "What is Gremlin?" and will allow 
vendors to realize how they can leverage the execution engine of Gremlin for 
their graph product (and if they want their own query language, simply have it 
compile to a Gremlin traversal). 

Finally, note that I did not discuss {{TraversalStrategies}} (i.e. compiler 
optimizers). These are important, but do not effect the conceptual distinctions 
made above or their practical application.


  was:
There is a lot of confusion as to "what is Gremlin?" Is it Groovy? Whats 
Gremlin-Scala? Does it only work for the JVM? Is it a language or is it an 
execution engine? Is Gremlin OLAP different from Gremlin OLTP? How is Cypher 
related to Gremlin? Who is John Galt?

For TinkerPop 3.2.0, we need to make explicit *The Gremlin Graph Traversal 
Machine and Instruction Set*. Here is a direct mapping to the Java world so 
people can get a grasp of where we are going with this.

{code}
Gremlin-Java8 == Java programming language
Gremlin-Scala == Scala programming language
Gremlin-Groovy == Groovy programming language
...
Gremlin steps == Java instruction set
Gremlin machine == Java virtual machine
{code}

Lets go through each one of these to flush out the concepts.

*Gremlin as a Language.*

When you see:

{code}
g.V().as("a").out("knows").as("b").
  select("a","b").
    by("name").
    by("age")
{code}

You are bearing witness to Gremlin-Java8. This is a fluent-style graph 
traversal API that ultimately writes out Gremlin steps. The Gremlin language 
(i.e. the fluent-style for doing graph traversing) can be embedded in any host 
language that supports function composition and function nesting. Because of 
this simple requirement, there exists Gremlin-Groovy, Gremlin-Scala, 
Gremlin-Clojure, etc. However, if the host language is NOT on the JVM, then it 
doesn't compile directly to Gremlin steps. How do we support other languages 
outside the JVM? There are two ways currently, via GremlinServer, but also, via 
the step-syntax described next.

*Gremlin as an Instruction Set.*

{code}
g.V().as("a").out("knows").as("b").
  select("a","b").
    by("name").
    by("age")
{code}

The above Gremlin-Java8 query gets compiled down to a step sequence called a 
"traversal." Here is the *unique* string representation of the query above.

{code}
[GraphStep([],vertex)@[a], VertexStep(OUT,[knows],vertex)@[b], SelectStep([a, 
b],[value(name), value(age)])]
{code}

The steps are the primitives of the Gremlin graph traversal machine. They are 
the parameterized instructions that the machine ultimately executes. The 
Gremlin instruction set is approximately 30 steps. These are sufficient to 
provide general purpose computing and what is typically required to express the 
common motifs of any user query. Moreover, with Gremlin3 there is no need for 
lambdas so the step syntax above is sufficient to completely express the 
traversal computation.

Note that ANY one can create a "Gremlin program" (traversal). You can imagine a 
compiler that takes SPARQL and writes out Gremlin steps. Neo4j's Cypher could 
be compiled to write out Gremlin steps. Realize that SPARQL is NOT a JVM 
language. Thus, PHP/Python/Go/Rust/Lisp can all have Gremlin "languages" that 
simply write out traversal steps in the step-syntax presented previously. Why 
would anyone want to do this? -- because of the Gremlin graph traversal machine.

*Gremlin as a Virtual Machine.*

The Gremlin graph traversal machine can be executed in OLTP or OLAP. What does 
that mean? It means Gremlin can be executed single machine or distributed 
across a multi-machine compute cluster. Best of all, NOTHING changes about the 
language or the instruction set. That is the promise of the Gremlin machine. 
Most importantly, the Gremlin machine is agnostic to the underlying graph 
system. It works over Titan, Neo4j, OrientDB, Giraph, Hadoop, Spark, 
BlazeGraph, InfiniteGraph, SQLg, Bitsy, IBM Graph Store, AmazonDynamoDB, 
ArangoDB, etc. The Gremlin machine simply executes the steps and a result set 
is returned. It doesn't care if the steps were created from Gremlin-Java8, 
Cypher, SPARQL, MyGraphLanguage, R:Statistics, NetLogo, punch cards, etc. It 
works at the step level, not the human-writable level. This is the power of 
Gremlin. Gremlin is to the graph world as Java is the programming world. And 
best of all, Gremlin is owned by the Apache Software Foundation and thus, free 
to use for any reason commercial or otherwise.

----

So, whats next?

  1. Re-read http://arxiv.org/abs/1508.03843.
  2. Hone the Gremlin step library for TinkerPop 3.2.0
  3. Write a blog explaining this concept for developers to understand.
  4. Write a SPARQL(--) (toy) compiler that generates Gremlin steps for 3.

Hopefully that answers the question -- "What is Gremlin?" and will allow 
vendors to realize how they can leverage the execution engine of Gremlin for 
their graph product (and if they want their own query language, simply have it 
compile to a Gremlin traversal). 

Finally, note that I did not discuss {{TraversalStrategies}} (i.e. compiler 
optimizers). These are important, but do not effect the conceptual distinctions 
made above or their practical application.



> [Proposal] Make the Gremlin Graph Traversal Machine and Instruction Set 
> Explicit
> --------------------------------------------------------------------------------
>
>                 Key: TINKERPOP3-813
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-813
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: process
>            Reporter: Marko A. Rodriguez
>            Assignee: Marko A. Rodriguez
>
> There is a lot of confusion as to "what is Gremlin?" Is it Groovy? Whats 
> Gremlin-Scala? Does it only work for the JVM? Is it a language or is it an 
> execution engine? Is Gremlin OLAP different from Gremlin OLTP? How is Cypher 
> related to Gremlin? Who is John Galt?
> For TinkerPop 3.2.0, we need to make explicit *The Gremlin Graph Traversal 
> Machine and Instruction Set*. Here is a direct mapping to the Java world so 
> people can get a grasp of where we are going with this.
> {code}
> Gremlin-Java8 == Java programming language
> Gremlin-Scala == Scala programming language
> Gremlin-Groovy == Groovy programming language
> ...
> Gremlin steps == Java instruction set
> Gremlin machine == Java virtual machine
> {code}
> Lets go through each one of these to flush out the concepts.
> *Gremlin as a Language.*
> When you see:
> {code}
> g.V().as("a").out("knows").as("b").
>   select("a","b").
>     by("name").
>     by("age")
> {code}
> You are bearing witness to Gremlin-Java8. This is a fluent-style graph 
> traversal API that ultimately writes out Gremlin steps. The Gremlin language 
> (i.e. the fluent-style for doing graph traversing) can be embedded in any 
> host language that supports function composition and function nesting. 
> Because of this simple requirement, there exists Gremlin-Groovy, 
> Gremlin-Scala, Gremlin-Clojure, etc. However, if the host language is NOT on 
> the JVM, then it doesn't compile directly to Gremlin steps. How do we support 
> other languages outside the JVM? There are two ways currently, via 
> GremlinServer, but also, via the step-syntax described next.
> *Gremlin as an Instruction Set.*
> {code}
> g.V().as("a").out("knows").as("b").
>   select("a","b").
>     by("name").
>     by("age")
> {code}
> The above Gremlin-Java8 query gets compiled down to a step sequence called a 
> "traversal." Here is the *unique* string representation of the query above.
> {code}
> [GraphStep([],vertex)@[a], VertexStep(OUT,[knows],vertex)@[b], SelectStep([a, 
> b],[value(name), value(age)])]
> {code}
> The steps are the primitives of the Gremlin graph traversal machine. They are 
> the parameterized instructions that the machine ultimately executes. The 
> Gremlin instruction set is approximately 30 steps. These are sufficient to 
> provide general purpose computing and what is typically required to express 
> the common motifs of any user query. Moreover, with Gremlin3 there is no need 
> for lambdas so the step syntax above is sufficient to completely express the 
> traversal computation.
> Note that ANY one can create a "Gremlin program" (traversal). You can imagine 
> a compiler that takes SPARQL and writes out Gremlin steps. Neo4j's Cypher 
> could be compiled to write out Gremlin steps. Realize that SPARQL is NOT a 
> JVM language. Thus, PHP/Python/Go/Rust/Lisp can all have Gremlin "languages" 
> that simply write out traversal steps in the step-syntax presented 
> previously. Why would anyone want to do this? -- because of the Gremlin graph 
> traversal machine.
> *Gremlin as a Virtual Machine.*
> The Gremlin graph traversal machine can be executed in OLTP or OLAP. What 
> does that mean? It means Gremlin can be executed single machine or 
> distributed across a multi-machine compute cluster. Best of all, NOTHING 
> changes about the language or the instruction set. That is the promise of the 
> Gremlin machine. Most importantly, the Gremlin machine is agnostic to the 
> underlying graph system. It works over Titan, Neo4j, OrientDB, Giraph, 
> Hadoop, Spark, BlazeGraph, InfiniteGraph, SQLg, Bitsy, IBM Graph Store, 
> AmazonDynamoDB, ArangoDB, etc. The Gremlin machine simply executes the steps 
> and a result set is returned. It doesn't care if the steps were created from 
> Gremlin-Java8, Cypher, SPARQL, MyGraphLanguage, R:Statistics, NetLogo, punch 
> cards, etc. It works at the step level, not the human-writable level. This is 
> the power of Gremlin. Gremlin is to the graph world as Java is the 
> programming world. And best of all, Gremlin is owned by the Apache Software 
> Foundation and thus, free to use for any reason commercial or otherwise.
> ----
> So, whats next?
>   1. Re-read http://arxiv.org/abs/1508.03843.
>   2. Hone the Gremlin step library for TinkerPop 3.2.0
>   3. Write a blog explaining this concept for developers to understand.
>   4. Write a SPARQL(--) (toy) compiler that generates Gremlin steps for for 
> the blog in 3.
> Hopefully that answers the question -- "What is Gremlin?" and will allow 
> vendors to realize how they can leverage the execution engine of Gremlin for 
> their graph product (and if they want their own query language, simply have 
> it compile to a Gremlin traversal). 
> Finally, note that I did not discuss {{TraversalStrategies}} (i.e. compiler 
> optimizers). These are important, but do not effect the conceptual 
> distinctions made above or their practical application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TINKERPOP3-813) [Proposal] Make the Gremlin Graph Traversal Machine and Instruction Set Explicit

Reply via email to